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(54) Sequence^etermlned DNA fragments and corresponding polypeptides encoded thereby 



(57) The present invention provides DNA molecules 
that constitute fragments of the genome of a plant, and 

polypeptides encoded thereby. The DNA molecules are 
useful for specifying a gene product in cells, either as a 
promoter or as a protein coding sequence or as an UTR 
or as a 3* terminatkx) sequence, and are also useful in 
controlling the behavior of a gene in the chromosome. 



in controlling the expression of a gene or as tools for 
genetic mapping, recognizing or isolating identical or re- 
lated DNA fragments, or identificatbn of a particular in- 
dividual organism, or for clustering of a group of organ- 
isms with a common trait. 

^Arabidopsis DNA is used in the present experi- 
ment, but the procedure is a general one. 
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Description 

FIELD OF THE INVENTION 

[0001] The present invention relates to isolated polynucleotides that represent a complete gene, or a fragment there- 
of, that is expressed. In addition, the present invention relates to the polypeptide or protein corresponding to the coding 
sequence of these polynucleotides. The present invention also relates to isolated polynucleotides that represent reg- 
ulatory regions of genes. The present invention also relates to isolated polynucleotides that represent untranslated 
regions of genes. The present invention further relates to the use of these isolated polynucleotides and polypeptides 
and proteins. 

DESCRIPTION OF THE RELATED ART 

[0002] Efforts to map and sequence the genome of a number ol organisms are in progress; a few complete genome 
sequences, for example those of E. co// and Saccharomyces cerevsiae are known (Blattner et aL. Science 277: 1453 
(1997); Goffeau et al.. Sc*e/ic0 274:546 (1996)). The complete genome of a multiceltular organism. C. etogans, has 
also been sequenced (See. the C. elegans Sequencing Consortium. Science 282:201 2 (1998)). To date, no complete 
genome of a plant has been sequenced, nor has a complete cDNA complement of any plant been sequenced. 

SUMMARY OF THE INVENTION 

[0003] The present invention comprises polynucleotides, such as complete cDNA sequences and/or sequences of 
genomic ON A encompassing complete genes, fragments of genes, and/or regulatory elements of genes and/or regions 
with other functions and/or intergenic regions, hereinafter collectively referred to as Sequence-Determined DNA Frag- 
ments (SDFs)» from different plant species, particularly com, wheat, soybean, rice an6 Arabidopsis thaliana, and other 
plants and or mutants, variants, fragments or fusions of said SDFs and polypeptides or proteins derived therefrom. In 
some instances, the SDFs span the entirety of a protein-coding segment In some instances, the entirety of an mRNA 
is represented. Other objects of the invention that are also represented by SDFs of the invention are control sequences, 
such as, but not limited to. promoters. Complements of any sequence of the invention are also considered part ol the 
inventbn. 

[0004] Other objects of the invention are polynucleotides comprising exon sequences, polynucleotides comprising 
intron sequences, polynucleotides comprising introns together with exons, intron/exon junction sequences. 5* untrans- 
lated sequences, and 3' untranslated sequences of the SDFs of the present invention. Polynucleotides representing 
the joinder of any exons described herein, in any arrangement, for example, to produce a sequence encoding any 
desirable amino acid sequence are within the scope of the invention. 

[0005] The present invention also resides in probes useful tor isolating and identifying nucleic acids that hybridize 
to an SDF of the invention. The probes can be of any length, but more typically are 12-2000 nucleotides in length; 
more typically, 15 to 200 nucleotides long; even more typically, 18 to 100 nucleotides tong. 

[0006] Yet another object of the invention is a method of isolating and/or identifying nucleic acids using the following 
steps: 

(a) contacting a probe of the instant invention with a polynucleotide sample under conditions that pennit hybridi- 
zation and formation of a polynucleotide duplex; and 

(b) detecting and/or isolating the duplex of step (a). 

[0007] The conditions for hybridization can be from low to moderate to high stringency conditions. The sample can 
include a polynucleotide having a sequence unique in a plant genome. Probes and methods of the invention are useful, 
for example, without limitation, for mapping ol genetic traits and/or for positional cloning of a desired fragment of ge- 
nomic DNA. 

[0008] Probes and methods of the invention can also be used for detecting alternatively spliced messages within a 
species. Probes and methods of the invention can further be used to detect or isolate related genes In other plant 
species using genomic DNA (gDNA) and/or cDNA libraries. In some instances, especially when longer probes and tow 
to moderate stringency hybridization conditions are used, the probe will hybridize to a plurality of cDN A and/or gDNA 
sequences of a plant. This approach is useful tor isolating representatives of gene families which are identifiable by 
possession of a common functional domain in the gene product or which have comnrK)n cis-acting regulatoiy sequences. 
This approach is also useful for identifying orthologous genes from other organisms. 

[0009] The present invention also resides in constructs for modulating the expression of the genes comprised of all 
or a fragment of an SDF The constructs comprise all or a fragment of the expressed SDF. or of a complementary 
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promoter, introns. untranslated regions ^^^S^S^ ^ ^ compnTir,g 

gio,«. DNA and chromatin conforr^ticS ^m^Zl^'^^T'^^ "^"^^ ^ '«^"<=^9 
plasmid. bacterialartificialchromosomes (S DbS^^' ^ cor«tructed using viral, 

plant artificial chromosomes ^^^^^J^^^'I^TT^^'" «"'°""^'«Pbnt plaLids 

as DNA integrated into the genome. y!!^ZtZT.^^ZZ'^f f "''"^'^^ « 

wrth. or operatively linked to. a heteK,logous '"'^S'^'^ 

operably linked to a promoter that is tunXal^rpiST ' "^"^ ^ ^DF might be 
[0010] The present invention also residp<i in hr>et i 

.ha, ha,tK>r constructs such as de^^S^Jie^mi ^""ff ^« ^ °^ P'^"' P-^nts 

expression of specific genes in plants by ^^Z^^^.^^'^T!^ '^"bting 

sion Of one or more endogenous genesha^o^ylupp^^ 

in a plan,. Methods of modulation of gene e^TLsiS^ STw^l^ 

copies of a polynucleotkJe comprising'a cod^^ZeS^t.'^^^^^^^ « hos, cell additional 

inserting antisense or ribozyme constructs into a hTcr^S^flfS"^ , ^""^^^ ^ host cell; (3) 

a sequence encoding a .r^, fra^n, „ fusion^^ftl^K^eXS^^^^^^ 
BRIEF DESCRIPTION OF THE TABLES 

KL^edXr^'aXte^^^^ 

Tables 1 and 2.- The REF Tables refer to a^um^crf M^^^'iILTc ^' ^ '? "^^^^^^ ' 2- SEQ 

,o the kxigest cDf^ obtained, either by ck^SHr^ STL^iT^ Sequences' or MLS.' Each MLS corresponds 

10012] The REF Table «Kludes the folbwing infomiation relating to each ML?' 
I. cDNA Sequence 



^ A 5* UTR 

B. Coding Sequence 

C. 3' UTR 



N. Genomic Sequence 

A. Exons 

B. Introns 

C. Promoters 

III. Link of cDNA Sequences to Clone IDs 

IV. Multiple Transcription Start Sites 

V. Polypeptide Sequences 

A Signal Peptide 

B. Domains 

C. Related Polypeptides 



VI. Related Polynucleotide Sequences 
^ t. cDNA SEQiJPMr.F 

A 5* UTR 
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genomic sequence as indicated in the REF Tables. The sequence that matches, beginning at any of the transcriptional 
start sites and ending at the last nucleotide before any of the translational start sites corresponds to the 5' UTR. 

B. Coding Region 

[0015] The coding region is the sequence in any open reading irame found in the MLS. Coding regions of interest 
are indicated in the Poly P SEQ subsection of the REF Tables. 

C. S'UTR 

[001 6] The location of the 3' UTR can be determined by comparing the most 3' MLS sequence with the corresponding 
genomic sequence as indicated in the REF Tables. The sequence that matches, beginning at the translational stop 
site and ending at the last nucleotide of the MLS corresponds to the 3' UTR. 

n. GENOMIC SEQUENCE 

[001 7] Further, the REF Tables indicate the specific gi* number of the genomic sequence if the sequence resides in 
a public databank. For each genomic sequence, the REF Tables hdicate which regions are included in the MLS. These 
regions can include the 5* and 3' UTRs as well as the coding sequence of the MLS. See, for example, the scheme below: 



Region 2 Region 3 



I Exon t 1 Exon | 3' UTR I 



I 

Intron i 

Stop CodoQ 



Region 1 

1 5' UTR t Exon | 

I - I 

Promoter | intron 

Translational 
Start Site 



[001 8J The REF Tables report the first and last base of each region that are included in an MLS sequence. An example 
is shown below: 

gi No. 47000: 

37102 ... 37497 

37593 ... 37925 

The numbers indicate that the MLS contains the following sequences from two regions of gi No. 47000; a first region 
including bases 37102-37497, and a second region including bases 37593-37925. 

A. EXON SEQUENCES 

[0019] The location of the exons can be determined by comparing the sequence of the regions from the genomic 
sequences with the corresponding MLS sequence as indicated by the REF Tables. 

i. INITIAL EXON 

[0020] To determine the location of the initial exon, information from the 

(1) polypeptide sequence section; 

(2) cDNA polynucleotide section: and 

(3) the genomic sequence section 

of the REF Tables are used. First, the polypeptide section will indicate where the translational start site is located in 
the MLS sequence. The MLS sequence can be matched to the genomic sequence that corresponds to the MLS. Based 
on the match between the MLS and corresponding genomic sequences . the location of the translational start site can 
be detenmined in one of the regions of the genomic sequence. The location of this translational start site is the start of 
the first exon. 

[0021] Generally, the last base of the exon of the corresponding genomic region, in which the translational start site 
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was located. wiU represent the end of the initial exon. In some cases, the initial exon will end with a stop codon when 
the mitial exon is the only exon. • a oiup waon, wnen 

[0022J In the case when sequences representing the MLS are in the positive strand ol the corresoondina oanomir 
sequence, the last base will be a larger number than the first base. Whence sequel Ze^Z^M^T" 
me negat^e s.-and of the corresponding genomic sequence, then the last base^l berZ^^;:Sl^Z5.e 



ii. INTERNAL EXONS 



regrons that match me MLS sequence are the mtemal axons. Specifically, the bases defining the boundaries 
remaining regions also define the Intron/exon junctions of the internal exons. ooundanes of the 



III TERMINAL EXON 



[0024] As with the initial exon, the location of the temiinal exon is detennined with infonnation from the 



(1) polypeptide sequence section; 

(2) cDNA polynucleotide section; and 

(3) the genomic sequence section 



of the REF Tables. The polypept«Je section wOl indicate where the stop codon is located in the MLS sequence The 
MLS sequence can be matched to the corresponding genomic sequence. Based on the match behSSn Mls I^d 
corresponding genomjc sequences, the location of the stop codon can be detemiined in one of mrreg^s ^'Je 
9enc««c sequence. The location of thisstop codon is t^^ 

of the coTOsponding genomic region that matches the cDNA sequence, in which the st^p codcS S tea^l S 

[0025] In the case when the MLS sequences are in the oositive strand of tho rorroo«r.«^j«^ 

^st base Will be a ^rger number thaS me first base. ^Tnte M^sCn^T^t'e'^^^^^^^^^ 2 

corresponding genomic sequence, men me last base will be a smaller number than me first Lse 

B. IhfTRON SEQUENCES 

S !k «>"esponding to me MLS are defined by identifying the genomic sequence located 

between me regions where me genomic sequence comprises exons. Thus. intrSs Le defin^as sZinq onrSse 

C. PROMOTER SEQUENCES 

tS. .TrJ!^'^ sequences corresponding to the MLS are defined as sequences upstream of 

III. LINK of cDNA SEQUENCES to CLONg in« 

!.^Jn'„«?H "^"^ "^'^ ""^'"^ ''^^^ to each MLS. The MLS sequence can 

IV. Multiple Transcription Start Sites 

[0029] Initiation of transcription can occur at a number of sites of the qene The REF t?,hipc. inHi.o». tK 
muBiple transcr^tion sites for each gene. In the REF tables, me .ocatiZf me Uan^Tp.f^^^^^^^^^^^ 

LC^ih T^ ^^ "^"^^ '""""^'^ to the tran^i^ ^rt^itef^ 

located « me MLS sequence. The negative numbers indicate me transcription stari site wimin S sequL^ 
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that corresponds to the MLS. 

[0030] To determine the location of the transcription start sites with the negative numbers, the MLS sequence is 
aligned with the corresponding genomic sequence. In the Instances when a public genomic sequence is referenced, 
the relevant corresponding genomic sequence can be found by direct reference to the nucleotide sequence indicated 
by the gl" number shown in the public genomic DNA section of the REF tables. When the position is a negative number, 
the transcription start site is located In the corresponding genomic sequence upstream of the base that nnatches the 
beginning of the MLS sequence in the alignment. The negative number is relative to the first base of the MLS sequence 
which matches the genomic sequence corresponding to the relevant gi' number. 

[0031] In the instances when no public genomic DNA is referenced, the relevant nucleotide sequence for alignment 
Is the nucleotide sequence associated with the amino acid sequence designated by gi* number of the later PolyP SEQ 
subsection. 



IS 



20 



V. Pohfpeptlde Sequences 

[0032) The PolyP SEQ subsection lists SEQ ID NOs and Ceres SEQ ID NO for polypeptide sequences corresponding 
to the coding sequence of the MLS sequence and the location of the translational start site with the coding sequence 
of the MLS sequence. 

[0033] The MLS sequence can have multiple translational start sites and can be capable of producing more than 
one polypeptide sequence. 

A. Signal Peptide 



25 



[0034] The REF Tables also indicate in subsection (B) the cleavage site of the putative signal peptide of the polypep- 
tide corresponding to the coding sequence of the MLS sequence. Typically, signal peptide coding sequences comprise 
a sequence encoding the first residue of the polypeptide to the cleavage site residue. 



B. Domains 



30 



[0035] Subsection (C) provides information regarding identified domains (where present) within the polypeptide and 
(where present) a name for the polypeptide domain. 



C. Related Pohf peptides 



35 



[0036] Subsection (Dp) provides (where present) information conceming amino acid sequences that are found to be 
related and have some percentage of sequence identity to the polypeptide sequences of REF and SEQ TABLES 1 
AND 2. These related sequences are identified by a gi" number. 
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VI. Related Polynucleotide Sequences 

[0037] Subsection (Dn) provides polynucleotide sequences (where present) that are related to and have some per- 
centage of sequence Identity to the MLS or corresponding genomic sequence. 



Abbreviation 


Description 


Max Len. Seq. 


Maximum Length Sequence 


rel to 


Related to 


Clone Ids 


Clone ID numbers 


Pub gDNA 


Public Genomic DNA 


gi No. 


gi number 


Gen. seq. in cDNA 


Genomic Sequence in cDNA (Each region for a single gene prediction is 
listed on a separate line. 




In the case of multiple gene predictions, the group of regions relating to a 
single prediction are separated by a blank line) 


(Ac) cDNA SEQ 


cDNA sequence 
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Abbreviation 

- Pat Apptn. SEQ (D NO 

- Ceres SEQ ID NO: 1673877 
SEQ # w. TSS 

-Clone ID #:#-># 
PolyP SEQ 

- Pat. Appln. SEQ ID NO: 

- Ceres SEQ ID NO 

- Loc. SEQ ID NO: @ nt. 

(C) Pred . PP Norn. & Annot 

- (Title) 

- Loc. SEQ ID NO #: # -> # aa, 

(Dp) Rel. AA SEQ 

- Align. NO 
-gi No 

- Desp. 

- % Idnt. 

- Align. Len. 
Loc. SEQfDN6:#->#aa 



(continued) 

Description 

Patent AppHca tion SEQ ID NO. 
Ceres SEQ ID NO: 

Clone ID comprises bases # to # of the cDNA Sequence 
Polypeptide S equence 
Patent Appl ication SEQ ID NO: 
Ceres SEQ ID NO: 

Loca^fon of translational start site in cDNA of SEQ ID NO: at nucleotide 



Nomination and Annotation of Domains within Predicted Po lypeptide's^ 
Name of Domain 



Location of the domain within the polypeptide of SEQ ID NO: from # to #" 
ammo acid residues. 

Related Amino A cid Sequences 

Alignment number 

Gi number 

Description 

Percent identity 

Alignment Length 



Location within SEQ ID NO: from # to # amino acid residue. 



DETAILED D ESCRIPTION OF THE INVglsiTIOM 

[0038] The invention relates to (I) polynucleotides and methods of 

IA. Probes. Primers and Substrates; 

IB. Methods of Detection and Isolation; 

B.I. Hybridization; 

B.2. Methods of Mapping; 

B.3. Southern Blotting; 

B.4. Isolating cDNA from Related Organisms; 

B. 5. Isolating and/or Identifying Orthologous Genes 

IC. Methods of Inhibiting Gene Expression 

C. 1 . Antisense 

C.2. Ribozyme Constructs; 

C.3. Chimeraplasts; 

0.4 Co-Suppression; 

C.5. Transcriptional Silencing 

C.6. Other Methods to Inhibit Gene Expression 

ID. Methods of Functional Analysis; 

IE. Promoter Sequences and Their Use; 



use thereof, such as 



5 
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IF. UTRs and/or Intron Sequences and Their Use; and 

IG. Coding Sequences and Their Use. 

[0039] The invention also relates to (II) polypeptides and proteins and methods oT use thereof, such as 

1 1 A. Native Polypeptides and Proteins 

A.1 Antibodies 

A.2 In Vitro Applications 

IIB. Polypeptide N^riants, Fragments and Fusions 



B.I Variants 
B.2 Fragments 
»^ B.3 Fusions 

[0040] The invention also includes (IN) methods of modulating polypeptide production, such as 
Hi A. Suppression 

20 

A.1 Antisense 
A.2 Ribozymes 

A. 3 Co-suppression 

A.4 Insertion of Sequences into the Gene to be Modulated 
2^ A.5 Promoter Modulation 

A. 6 Expressbn of Genes containing Dominant-Negative Mutations 

I IIB. Enhanced Expression 

^ B.I Insertion of an Exogenous Gene 

B. 2 Promoter Modulation 

[0041] The invention further concerns (IV) gene constructs and vector construction, such as 

^5 IVA. Coding Sequences 

I VS. Promoters 
IVC. Signal Peptides 

[0042] The invention still further relates to 
V Transformation Techniques 

Definitions 



[0043] Allelic variant An allelic variant' is an alternative fomi of the same SDF. which resides at the same chro- 
mosomal locus in the organism. Allelic variations can occur in any portion of the gene sequence, including regulatory 
regions. Allelic variants can arise by normal genetic variation in a population. Allelic variants can also be produced by 
genetic engineering methods. An allelic variant can be one that is found in a naturally occurring plant, including a 
cultivar or ecotype. An allelic variant may or may not give rise to a phenotypic change, and may or may not be expressed. 
An allele can result in a detectable change in the phenotype of the trait represented by the locus. A phenotypically 
^0 silent allele can give rise to a product. 

[0044] Alternatively spliced messages Within the context of the current inventton. alternatively spliced messag- 
es" refers to mature mRNAs originating from a single gene with variations in the number and/or identity of exons, 
introns and/or intron-exon junctions. 

[0045] Chimeric The term chimeric" is used to describe genes, as defined supra, or contructs wherein at least 
two of the elements of the gene or construct, such as the promoter and the coding sequence andtor other regulatory 
sequences and/or filler sequences and/or complements thereof, are heterologous to each other. 
[0046] Constitutive Promoter.Promoters referred to herein as "constitutive promoters' actively promote transcription 
under most, but not necessarily all. environmental conditions and states of development or cell differentiation. Examples 
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pL^'^^^^a";? XT^^Zl ^"^^ '^r' "^""^^^ -^'^ region and «,e r or 2. 
P^. genes, sue. as tHe Zie^^^ZZT.l^o^J'jT^'^ '^^^ 

conditions. ^ ^'^ge and/or under the same or similar environmental 

.ure. and^or (3) mree^^e^sU cci^lSr SL^^ e:.^? ^^^^^1' ^r^-V 
proleins or motifs. Typically, these famHies and^or motifs tevTte«^rreb^ ^ ^T' ^ '^"^ °' 

tivities. A domain can be any length inciudino the3r«Zf .h!T»r . ^^"^ '/vvi/roand/or /n->rft,oac- 
dcmains. associated families andt^s SS^cSTeS S '^^ T""/ ''''^"^ ^iP«ons of the 

scribed below. Usually, the polypepHdes ^d^r^S^^ ?^ polypeptides of the instant invention are de- 

any polypeptide that LnprSThe il^^lST^^ 

KUSeZTproteCuZr^r-r;^^^ 
[0050] Exogenous ExogenouT"^ mferr^^i^h'?^^ 

whethir chiSferic ornot. tST;;S;y^siZi^,ri;;ur^^^^^^^ '^"'^^ 
organism regenerated f ^m said hc^ c^raTZn, Z k °' ^ »^ ce« or the 

this can be Sompliehed a^des^bS^LSirandTcti^l 1^^^^ ' ^"'^ ""^^ "h^'' 

Salmon et al. Jso _3:t4t H^::;:^tT: TeST^^^ ^ " 

papers are those by Escudero et al Plant J 10-355 /iQOftx Li^^ * . * monocots. representative 

/A?p/ar)/a techniques, and the (ike Such a olant nnntainir>« tK^ ^"'re/jf creneficsrz.QZ 1990)). electroporation. 
Oie primary .rarigeriic Zt^TJ^X ^r^^Z ^Z'Z,"""'^ ^ « "^«> '"^ 

en«.mpa^ insert^g a LralV fo^nd e.emel^tr :rj:raXr3^ " "^^ "^'^ ^^-'^^ 

LNrLSu'revZa part^l^Cc^ ^^'^'^ ^ ^"^ ^-"--^ « -erted into 

and may provide an ^Jr^^lS::Z'TZ^T„^,r " ^ ""^'^^ ' 

can include nonK«Jing^^^^ S^ubte Sf^^^^^^^^ . ' T'^^ SCHEMATIC 1). GeneJ 

specify polyadenylatio?. tS.^,?o^3re^at£n SSAS^for!^, " I ' ^-at 
base methylation and binding sU^loi^l Z^^^ conformation, extent and posHion of 

Which may be Intern^pted by introns' (LS Jg sC'ctf o^Ze^T.^' exons" (coding sequences), 
only RNA expression or protein production or n^v cn^r^lT^ '^T * 5®"®"° 

ciated expression. In certain cLeTtS« kS^^.XT !k °' ^"^"^ ""^'^^^ ^'^'^^ ^•ho"' ^ 

Will overlap the other A^nB^toi^^ll^' "^^ ^'^'^ ^^''"^"^^ a way that one gene 

etc. . or as a separat^ ^L^^H. ^ "'^ ^'"^ °' ^ ^'^anism. artilic«l chromosome, plasmid. v^ton 

Sf wrh^SsfsTpS^rer ' ^--''^ « -ated genes. 

coding region se^n^ ^^Tpi^^ao^TTZ^^^ '! heterotogous to an >^fc/d^« 

.0 a sec^ence SLoding SoS Z ""^"T ''^'^^^"^ 

end temdination sequences that do not orioinate in LuTfmm^L"? ^ T such as UTRs or 3" 

are considered heTerologous to saTc^^o X^^rm^tT^ T!"^ "^"^ ^"^^ ^"^^^ 
otherare not helerolog^S^io each ott,rZh3r.i.!^^ ^ ""''^ "^"^^ ^"^ contiguous to each 

heterotogousitother^llerseqTrcefeplS^b^^^ 

expressing an amino acid tS^ZeTartS hS^rXuL^ac^^^^^ 

com gene operatively Hnked in a novel rZJer arhSio" " ^ °' « 

s.rre^^Tnteres^:rs:=^^^ i"xrgs:rc;r '° ^ ^^-^ — 

functional domain such as. examples including iho JlimiiS. a DNA^t ^"^^ ^ '^'^^'''^ ^ 

El r::uC:Se"::L. c^^^'s^^srir : ~ °' ~ - « pr^. 

aoorganism.ce.l..organel.e..c.A.piiret-.^^^^^^^^ 
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cleotides of the present invention, is PARSK1. the promoter from the Ambldopsis gene encoding a serine-threonine 
kinase enzyme, and which promoter is induced by dehydration, abscissic acid and sodium chloride (V\fang and Good- 
man, Plant J. 8:37 (1995)) Examples of environmental conditions that may aflect transcription by inducible promoters 
include anaerobic conditbns, elevated temperature, or the presence of light. 
5 [0057] Intergenic region Intergenic region,' as used in the current invention, refers to nucleotide sequence oc- 
curring in the genome that separates adjacent genes. 

[0058] Mutant gene In the current invention, mutant* refers to a heritable change in DNA sequence at a specific 
location. Mutants of the current inventbn may or may not have an associated identifiable function vwhen the mutant 
gene Is transcribed. 

10 [0059] Orthologous Gene In the current invention orthologous gene" refers to a second gene that encodes a 
gene product that performs a similar function as the product of a first gene. The orthologous gene may also have a 
degree of sequence similarity to the first gene. The orthologous gene may encode a polypeptide that exhibits a degree 
of sequence similarity to a polypeptide corresponding to a first gene. The sequence similarity can be found within a 
functional domain or along the entire length of the coding sequence of the genes andtor their corresponding polypep- 

^5 tides. 

[0060] Percentage of sequence identity 'Percentage of sequence identity,' as used herein, is determined by 
comparing two optimally aligned sequences over a comparison window, where the fragment of the polynucleotide or 
amino acid sequence in the comparison window may comprise additions or deletions (e.g.. gaps or overhangs) as 
compared to the reference sequence (which does not comprise addrtlons or deletions) for optimal alignment of the two 
sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid 
base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number 
of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 
to yield the percentage of sequence identity. Optimal alignment of sequences for comparison may be conducted by 
the local homology algorithm of Smith and Waterman Add APL Math. 2:482 (1981), by the homology alignment al- 
gorithm of Needleman and Wunsch J. Mol. Biol. 48:443 (1970). by the search for similarity method of Pearson and 
Lipman Proc. Natl. Acad ScL (USA) 85: 2444 (1988), by computerized implementations of these algorithms (GAP, 
BESTRT, BLAST, PASTA, and TFASTA in the Wisconsin Genetics Software Package. Genetics Computer Group 
(GCG). 575 Science Dr., Madison, Wl), or by tnspectk)n. Given that two sequences have been identified for comparison. 
GAP and BESTFIT are preferably ennployed to determine their optimal alignment Typically, the default values of 5 00 
for gap weight and 0. 30 for gap weight length are used. The term "substantial sequence identity between polynucleotide 
or polypeptide sequences refers to polynucleotide or polypeptide comprising a sequence that has at least 80% se- 
quence Identity, preferably at least 85%. more preferably at least 90% and most preferably at least 95% even more 
preferably, at least 96%. 97%. 98% or 99% sequence identity compared to a reference sequence using the programs. 
[0061 ] Plant Promoter A plant promoter" is a promoter capable of initiating transcription in plant cells and can 
drive or facilitate transcription of a fragment of the SDF of the instant inventfon or a coding sequence of the SDF of the 
instant invention. Such pronrK)ters need not be of plant origin. For example, promoters derived from plant viruses, such 
as the CaMV35S promoter or from Agrobacterium tumefaciens such as the T-DNA promoters, can be plant promoters 
A typical example of a plant promoter of plant origin Is the maize ubiqultln-1 (ubi-l)promoter known to those of skill 
[0062] Promoter: The term 'promoter." as used herein, refers to a region of sequence determinants located 
upstream from the start of transcription of a gene and which are involved in recognitfon and binding of RN A polymerase 
and other proteins to initiate and modulate transcription. A basal promoter is the minimal sequence necessary for 
assembly of a transcription complex required for transcription initiation. Basal promoters frequenUy include a TATA 
box- element usually Ideated between 1 5 and 35 nucleotides upstream from the site of Initiation of transcription Basal 
promoters also sometimes Include a CC AAT box" element (typically a sequence CCAAT) and/or a GGGCG sequence 
usually located between 40 and 200 nucleotides, preferably 60 to 120 nucleotides, upstream from the start site erf 
transcription. 

[0063] Public sequence: The term public sequence ." as used in the context of the instant application refers to 
any sequence that has been deposited in a publicly accessible database. This term encompasses both amino acid, 
and nucleotide sequences. Such sequences are publicly accessible, tor example, on the BLAST databases on the 
NCBI FTP web site (accessible at ncbl.nlm.gov/blast). The database at the NCBI GTP site utilizes gi' numbers assigned 
by NCBI as a unique identifier for each sequence in the databases, thereby providing a non-redundant database for 
sequence from various databases, including GenBank. EMBL. DBBJ. (DNA Database of Japan) and PDB (Brookhaven 
Protein Data Bank). 

[0064] Regulatory Sequence The term regulatory sequence.' as used In the current Inventbn. refers to any 
nucleotide sequence that influences transcription or translation Initiation and rate, and stability and/or mobility of the 
transcript or polypeptide product. Regulatory sequences Include, but are not limited to. promoters, promoter control 
elements, protein binding sequences. 5' and 3' UTRs. transcriptional start site, termination sequence, polyadenylatton 
sequence, introns. certain sequences within a coding sequence, etc. 
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[0065] Related Sequences: Related sequences- refer to either a polypeptide or a nucleotide sequence that 
exhibits some degree ol sequence simHarity with a sequence described by the REF and SEQ tables. 
[0066] Scaffold Attachment Region (SAR) As used herein, scaffold attachment region' is a DNA sequence that 
anchors chromatin to the nuclear matrix or scaffold to generate loop domains that can have either a transcrtotionally 
active or inactive stmcture (Spiker and Thompson (1996) Plant PhysioL 110: 15-21). 

[0067] Sequence-determined DNA fragments (SDFs) Sequence<Jetermined DNA fragments' as used in the 
current invention are isolated sequences of genes, fragments of genes, intergenic regions or contiguous DNA from 
plant genomic DNA or cDNA or RNA the sequence of which has been determined. 

[0068] Signal Peptide A signal peptide' as used in the current invention is an amino acid sequence that targets 
the protein for secretion, for transport to an intracellular compartment or organelle or for incorporation into a membrane 
Signal peptides are indicated in the tables and a more detailed description located below 

[0069] Specific Promoter In the context of the cun-ent invention, specific promoters' refers to a subset of induc- 
ible promoters that have a high preference for being induced in a specific tissue or cell andtor at a specific time during 
development of an organism. By high preference" is meant at least 3>fold, preferably 5-fold, more preferably at least 
10-fold still more preferably at least 20-fold. 50-fold or 100-fold increase in transcription in the desired tissue over the 
transcription in any other tissue. Typical examples of temporal and/or tissue specific promoters of plant origin that can 
be used with the polynucleotides of the present invention, are: PTA29. a promoter which is capable of driving gene 
transcription specifically in tapetum and only during anther development (Koltonow et al,. Plant Ce^!2:^20^ (1990); 
RCc2 and RCc3. promoters that direct root-speciftc gene transcription in rice (Xu et al.. Plant MoL Biol 27:237 (1 995)' 
TobRB27, a root-specric promoter from tobacco (Yamamoto et al.. Piant Ce//3:371 (1 991 )). Examples of tissuespecific 
promoters Dnder developmental control Include promoters that wiitiate transcription only in certain tissues or organs 
such as root, ovule, fruit, seeds, or flowers. Other suitable promoters include those from genes encoding storage 
proteins or the lipid body membrane protein, oleosin. A few root-specific promoters are noted above. 
[0070] Stringency "Stringency" as used herein is a function of probe length, probe composition (G + C content), 
and salt concentration, organic solvent concentratbn. and temperature of hybridization or wash conditions Stringency 
IS typically compared by the parameter T„. which is the temperature at which 50% of the complementary molecules 
rn the hybridization are hybridized, in tenns of a temperature differential from T„. High stringency conditions are those 
providing a condition of T„ - 5*C to T„ - 10»C. Medium or moderate stringency conditions are those providing T - 
20** C to T„ - 29**C. Low stringency conditions are those providing a condition of T„ - 40'C to T„ - 48»C. The relationship 
of hybridization conditions to T„ (in "C) is ^pressed m the mathematical equation 

T„ = 81.5 -16.6(log,o(Na*]) + a41(%G4C) - (600/N) (1) 

where N is the length of the probe. This equation works well for probes 14 to 70 nucleotides in length that are identical 
to the target sequence. The equation below for T„ of DNA-DNA hybrids is useful for probes in the range of 50 to greater 
than 500 nucleotides, and for conditions that include an organic solvent (formamide). 

T„ = 81 .5^-1 6.6 tog {[Na*l/{1 +0.7{Na*])}+ 0.41 (%G+G)-500/L 0.63(%formamide) (2) 

where L is the length of the probe in the hybrid. (P Tijessen. Hybridization with Nucleic Acid Probes" in Laboratory 
Techniques in Biochemistry and Molecular Biology, RC. vand der Vliet. ed.. c. 1 993 by Elsevier. Amsterdam ) The T 
of equation (2) is affected by the nature of the hybrid; for DNA-RNA hybrids T^ is 10-15'C higher than calculated \ox 
RNA-RNA hybrids T„ is 20-25X higher. Because the T^ decreases about 1 for each 1% decrease in homology 
when a long probe Is used (Bonner et al..J. Mol. Bioi. 81:123 (1973)). sUingency conditions can be adjusted to favor 
detection of identical genes or related family members. 

[0071] Equation (2) is derived assuming equilibrium and therefore, hybridizations according to the present invention 
are most preferably performed under conditions of probe excess and for sufficient time to achieve equilibrium The 
time required to reach equilibrium can be shortened by inclusion of a hybridization accelerator such as dextran sulfate 
or another high volume polymer in the hybridization buffer. 

[0072] Stringency can be controlled during the hybridization reaction or after hybridization has occurred by altering 
the sail and temperature conditions of the wash solutions used. The formulas shown above are equally valid when 
used to compute the stringency of a wash solution. Preferred wash solution stringencies lie within the ranges stated 
above^; high stringency is 5-8»C below T„ medium or moderate stringency is 26-29'»C below T„ and tow stringency is 

[0073] Substantially free of A composition containing A is substantially free of B when at least 85% by weight 
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of the total A+B In the composition is A. Preferably, A comprises at least alx>ut 90% by weight of the total of A+B in 
the compositton, more preferably at least about 95% or even 99% by weight. For example, a plant gene or a DNA 
sequence can be considered substantially free of other plant genes or DNA sequences. 

[0074] Translational start site In the context of the current invention, a translational start site" is usually an ATG 
in the cDNA transcript, more usually the first ATG. A single cDNA, however, may have multiple translational start sites. 
[0075] Transcription start site Transcription start site" is used in the current invention to describe the point at 
which transcription is initiated. This point is typically located about 25 nucleotides downstream from a TFIID binding 
site, such as a TATA box. Transcription can initiate at one or more sites within the gene, and a single gene may have 
multiple transcriptional start sites, some of which may be specific for transcription in a particular cell-type or tissue. 
[0076] Untranslated region (UTR) A UTR" is any contiguous series of nucleotide bases that is transcribed, but 
is not translated. These untranslated regions may be associated wHh particular functions such as increasing mRNA 
message stability. Examples of UTRs niclude, but are not limited to polyadenylation signals, terminations sequences, 
sequences located between the transcriptional start site and the first exon (5' UTR) and sequences located between 
the last exon and the end of the mRNA (3* UTR). 

[0077] Variant: The term variant* is used herein to denote a polypeptide or protein or polynucleotide molecule 
that differs from others of its kind in some way. For example, polypeptide and protein variants can consist of changes 
in amino acid sequence and^or charge and/or post4ranslational modifications (such as glycosylation. etc). 

DETAILED DESCRIPTION OF THE INVENTION 

I. Polynucleotides 

[0078] Exemplified SDFs of the invention represent fragments of the genome of corn, vi^eat, rice, soybean or Am- 
bidopsis and/or represent mRNA expressed from that genome. The isolated nucleic acid of the invention also encom- 
passes corresponding fragments of the genome and/or cDNA complement of other organisms as described in detail 
below. 

[0079] Polynucleotides of the invention can be isolated from polynucleotide libraries using primers comprising se- 
quence similar to those described by the REF and SEQ Tables. See. for example, the methods described in Sambrook 
et al., supra. 

[0080] Alternatively, the polynucleotWes of the invention can be produced by chemical synthesis. Such synthesis 
methods are described below. 

[0081] It is contemplated that the nucleotide sequences presented herein may contain some small percentage of 
errors. These errors may arise in the normal course of determination of nucleotide sequences. Sequence errors can 
be corrected by obtaining seeds deposited under the accession numbers cited above, propagating them, isolating 
genomic DNA or appropriate mRNA from the resulting plants or seeds thereof, amplifying the relevant fragment of the 
genomic DNA or mRNA using primers having a sequence that flanks the erroneous sequence, and sequencing the 
amplification product. 

LA. Probes. Primers and Substrates 

[0082] SDFs of the inventk)n can be applied to substrates for use in array applications such as, but not limited to. 
assays of gtobal gene expression, for example under varying conditions of development, growth conditions. The arrays 
can also be used in diagnostic or forensic methods (WO95/35505, US 5,445.943 and US 5.410,270). 
[0083] Probes and primers of the instant invention will hybridize to a polynucleotide comprising a sequence in REF 
and SEQ TABLES 1 AND 2. Though many different nucleotkJe sequences can encode an amino acki sequence, the 
sequences of REF and SEQ TABLES 1 AND 2 are generally preferred for encoding polypeptides of the inventbn. 
However, the sequence of the probes and/or primers of the instant invention need not be identical to those in REF and 
SEQ TABLES 1 AND 2 or the complements thereof. For example, some variation in probe or primer sequence and/br 
length can allow additional family members to be detected, as well as orthologous genes and more taxonomically 
distant related sequences. Similarly, probes and/or primers of the inventk>n can include additional nucleotkJes that 
serve as a label for detecting the formed duplex or for subsequent cloning purposes. 

[0084] Probe length will vary depending on the application. For use as primers, probes are 12-40 nucleotides, prel- 
erably 18-30 nucleotides long. For use in mapping, probes are preferably 50 to 500 nucleotides, preferably 100-250 
nucleotides long. For Southern hybridizations, probes as long as several kilobases can be used as explained below. 
[0085] The probes and/or primers can be produced by synthetic procedures such as the triester method of Matteucci 
et al. J. Am. Chem. Soc. r03:3185( 1981); or according to Urdea et al. Proc, Natl. Acad. 80:7461 (1981) or using 
commercially available automated oligonucleotide synthesizers. 
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I B. Methods of Detection and Isohtinn 



[00861 "f^® P<>^^"clTOlkJes of the invention can be utilized in a niimhor «i ™«fK~i. I, . ... 

as probes and/or primers to isolate and detect pZuS^ T^!^ ^\ ^^^f """""^ "'^ ^ 

BWhedDNA hybridization assays oolvmerasecterrZS^' ^ ^ ^' Swrthems. Northerns. 

CO methods given by way of ^pSZ^^'^^'^ ^ ^-'^^ ^P- 



Hybridization 
Methods of Mapping 
Southern Blotting 

Isolating cDNA from Reteted Organisms 
Isolating and/or Identifying Orthologous Genes. 



Also, the nucleic acid molecules of the invention can hqoh in r.fh«r r««ti • 



B.I. Hybridization 



[0087] The isolated SDFs of REF and SEQ TABLES i Awn o r^i th*. r.r^^ • • 

or prniers for detection and/or Isolation <^?eSZZl>^^^s^'^ri'Z^H^T^ ^ """"^ ^ 
one nucleic acidtoanotherconstltmesaphysicai^lTtetJi^^^^^^ trough f^bndEatcn. Hybridization of 

related sequences. Also, such hybridiz^S^^EiS SSL^Z^^^^^^^ 

correspondence as described betow ^u^nues is me same wnen aligned for maximum 

^^^^^^^ 

St orbaToril^l^lTsCerc^raX^^^^^^ r ""^^ ^^""^ 

.he po^peptide produced frc^n the gJn" 2 L':::,^ Hen^Hhe or^'rJ f""^? """" ^^"^""^^ °' 
base sequence ma, has been chan^dfrom a s^H ^ ^EF fnd SEQ?ABL%T,TNrp? '^'^ 
ance with degeneracy of aenetic cc^e Rnfo™ hI! ^ M TABLES 1 AND 2 by substitution In accord- 

B.2. Mapping 
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associated with a phenotype, all SDFs can be used as probes for identifying polymorphisms associated with phenotypes 
of interest. Briefly, one method of mapping involves total DNA isolation from individuals. It is subsequently cleaved with 
one or more restriction enzymes, separated according to mass, transferred to a solid support, hybridized with SDF 
DNA and the pattern of fragments compared. Polymorphisms associated with a particular SDF are visualized as dif- 

s f erences in the size of fragments produced between individual DNA samples after digestion with a partlcuiar restriction 
enzyme and hybridization with the SDF. After identification of polymorphic SDF sequences, linkage studies can be 
conducted. By using the individuals showing polymorphisms as parents in crossing programs, F2 progeny recombinants 
or recombinant inbreds, for example, are then analyzed. The order of DNA polymorphisms along the chrorrxjsomes 
can be determined based on the frequency with which they are inherited together versus independently The closer 

10 two polymorphisms are together in a chromosome the higher the probability that they are inherited together. Integration 
of the relative positions of all the polymorphisms and associated marker SDFs can produce a genetk: map of the 
species, where the distances between markers reflect the recombinatbn frequencies in that chromosome segment. 
[0094] The use of recombinant inbred lines for such genetic rr^ping is described for Ambidopsis by Alonso-Bianco 
etal. {Methods in Molecular Biology,\/oi.B2, Ambidopsis Protocob", pp. 137-146, J.M. Martinez-Zapaterand J. Salinas. 

IS eds., c. 1998 by Humana Press, Totowa, NJ) and for corn by Burr { Mapping Genes with Recombinant lnbreds\ pp. 
249-254. In Freeling, M. and V. Walbot (Ed). The Maize Handbook, c. 1994 by Springer-Veriag New York, Inc.: New 
York. NY. USA; Berlin Germany; Burr et al. Genetk:s (1998) 118: 519; Gardiner. J. et al.. (1993) Genetics 134: 917). 
This procedure, however, Is not limited to plants and can be used for other organisms (such as yeast) or for individual 
cells. 

20 [0095] The SDFs of the present invention can also be used for simple sequence repeat (SSR) mapprig. Rice SSR 
mapping is described by Morgante et al. (The Plant Journal {^993) 3: 165), Panaud et al. (Genome (1995) 38: 1170); 
Senbr et al. (Crop Science (1 996) 36: 1 676), Taramino et al. (Genome (1 996) 39: 277) and Ahn et al. (Molecular and 
General Genetics (1993) 241: 483-90). SSR mapping can be achieved using various methods. In one instance, poly- 
morphisms are identified when sequence specific probes contained within an SDF flanking an SSR are made and used 

25 in polymerase chain reaction (PGR) assays with template DNA from two or more individuals of interest. Here, a change 
in the number of tandem repeats between the SSR-flanking sequences produces differently sized fragments (U.S. 
Patent 5,766,847). Alternatively, polymorphisms can be kientified by using the PGR fragment produced from the SSR- 
flanking sequence specific primer reaction as a probe against Southern bbts representing different individuals (U.H. 
Refseth et al., (1997) Electrophoresis 18: 1519). 

30 [0096] Genetic and physical maps of crop species have many uses. For example, these maps can be used to devise 
positbnal cloning strategies for isolating novel genes from the mapped crop species. In addition, because the genomes 
of closely related species are largely syntenic (that is, they display the same oirdering of genes within the genome), 
these maps can be used to Isolate novel alleles from relatives of crop species by positional cloning strategies. 
[0097] The varbus types of maps discussed above can be used with the SDFs of the inventkm to kjentify Quantitative 

3S Trait Loci (QTLs). Many important crop traits, such as the solids content of tomatoes, are quantitative traits and result 
from the combined interactions of several genes. These genes reside at different loci In the genome, oftentimes on 
different chromosomes, and generally exhibit multiple alleles at each locus. The SDFs of the invention can be used to 
identify QTLs and Isolate specific alle les as described by de Vicente and Tanksley (Genef/cs 134:585 ( 1 993)). In addition 
to isolating QTL alleles in present crop species, the SDFs of the invention can also tpe used to isolate alleles from the 

40 corresponding QTL of wild relatives. Transgenk: plants having various combinations of QTL alleles can then be created 
and the effects of the combinatbns measured. Once a desired allele combinatk>n has been identified, crop improvement 
can be accomplished either through bkitechnological means or by directed conventional breeding programs (for review 
see Tanksley and McCouch, Sc/enc© 277: 1063 (1997)). 

[0098] In another embodiment, the SDFs can be used to help create physcal maps of the genome of corn, Arabi- 
45 dopsis and related species. Where SDFs have been ordered on a genetk: map, as described above, they can be used 
as probes to discover which clones in large libraries of plant DNA fragments in YAGs. BACs, etc. contain the same 
SDF or similar sequences, thereby facilitating the assignment of the large DNA fragments to chromosomal positksns. 
Subsequently, the large BAGs, YAGs. etc. can be ordered unambiguously by more detailed studies of their sequence 
composition (e.g. Marra et al. (1997) Genomic Research 7:1072-1084) and by using their end or other sequences to 
50 find the identical sequences in other ctoned DNA fragments. The overlapping of DNA sequences in this way allows 
large contigs of plant sequences to be built that, when suffbiently extended, provkie a complete physk^al map of a 
chromosome. Sometimes the SDFs themselves will provkie the means of joining cloned sequences into a contig. 
[0099] The patent publfcatbn WO95/35505 and U.S. Patents 5,445.943 and 5,410.270 describe scanning multiple 
alleles of a plurality of loci using hybridization to arrays of oligonucleotides. These technques are useful for each of 
55 the types of mapping discussed above. 

[0100] Following the procedures described above and using a plurality of the SDFs of the present inventkjn. any 
individual can be genotyped. These indivkiual genotypes can be used for the kjentifk:ation of particular cultivars, va- 
rieties, lines, ecotypes and genetically modified plants or can sen/e as tools for subsequent genetic studies involving 



17 



EP 1 033 405 A2 

multiple phenotypic trails. 

B.3 Soulhem Bloi HvbridiTalinn 



[0101] The sequences from REF and SEQ TABLES 1 AND 2 can b« used as probes for various hybridization tech- 
niques These techniques are useful for detecting target polynucleotides In a sample or for determining whether trans- 
genic plants, seeds or host cells harbor a gene or sequence of interest and thus might be expected to exhibit a particular 
trait or pnerK>type. "wuiai 

[01021 In addition, the SDFs from the invention can be used to Isolate additional members of gene famflies from the 
same or drfferent species and/or orthologous genes from the same or different species. This is acconplished by hy- 
bndmng an SDF to, for example, a Southern blot containing the appropriate genomic DNA or cDNA. Given the resultinq 
hyt,r.d.zat»n data, one of ordinary skiD in the art could distinguish and isolate the correct DNA tragmente by size 
restnction sites, sequence and stated hybridization conditions from a gel or from a library. 

J^!?*"^"^!.?".^ orthologous genes from closely related species and alleles writhin a species is 

partKubrly des.rable because of their potential for crop improvement Many important crop traits, such L^T^w 
content of tomatoes, result from the combined interactions of the products of sev^enesTesiding at diwentlrSS 
the genome. GeneraHy alleles at each of these loci car, make quantitative differenc^ to the trait By idenCra^ 
.solating numerous alleles for each kx:us from within or different species, transgenb plants with varl^stmSSS 
ofalleles ranbecreated and the effects of the combinatfons measured Onc^ 

been idenf fled, crop improvement can be accomplished either through biotechnologk:al means or by directed comX 
tional breeding progranris (Tanksley et al. Sc/ence 277 1063(1 997)) oirecieo corwen 

u^J^I ^f'"^^'^'^ °* «he SDFs Of the imrentkx, to. for example. Southern btots containing DNA 
Jom another spec.es can also be used to generate restrictbn fragment maps for the corresponding genomic regbnT 

^tZ!^ '^"^J^I^ ^ '^"^^ ^ '^^'^^ saeywithin fragments. fuZr 

distnguishng mapped DNA from the remainder of the genome. >». >uiiner 

!nl^ by digesting genomic DNA with different contoinattons of restriction enzymes 

^^JT^ "^"'"^ '° '^"''^ '^«9^"t8 can range in size^T^ 20 

nucleotdes to several thousand nucleotWes. More preferably, the probe is 100 to 1.000 nucleSktes l^g ta^ent^ inq 

members ofagenefamlly When it islound that repetitivesequencesv«>uklcomplicatetho hybrid 

^m^Zl^'^.^rZ'^y^'- ""^^ '^^ preferably the tength ^Tthe gene SlVa 

l^Dto^^^iSSr^^ '°'''°°? ""^^"^ 9^"^^- hoLer. might Uira 

A 1 nucleotides teng or overlapping probes constituting the full-length sequence to span their lengths 
[01071 Also, while It IS preferred that the probe be homogeneous with respect to its sequence, it fe not n^es^ait" 
oenerat^'^^^^^^^ PCr?'' ''iT ' --^^ «>' a Sene family havin^divers; siutncT^b"^ 

?:reMhSrdX^r^^^^^^^ ""'^ °^ •^'"^'^'^^ ^^^^ ^^^^^ ^^^^^ ^-^lude s. 

.hl'S.i,/ ^U.*""^*"^ corresponding genes in another species, the next most preferable probe is a cDNA spanning 
SLftl b^"!' ^^'^ -RNA^ing fragment of the gene to be identified. pZeTcJ 

i^tTo JSl w"^ ^"""^ "^'^'"9 P"'"^'^ ^ «he ends of the SDF 

amlnr^!^ /^b'rfopsrs genome DNA as a template. In instances where the SDF includes sequence conserved 
among species pnmers including the consented sequence can be used for PCR with genomic DNA from a^ede^^ 

make o,S.STJ wt" " "'^ ' °' '^S^-' °' SDF^n be us" to 

MemZrthe PGR oSS'^^^^ ?^ '° "^'^ ' '^"'^^ '° "^'^ ^er^es containhg the domain. 

Usfno J^".*^^ ?K '^^^"^^ ^"^"^'^ electrophoresis, and cloned and/or sequenced. 

sSs^ etss.""' "^""'^ "^"'^ "^"""^ °' ^ '-'^ — « 

B-4.1 Isolatino DNA from Related Organisms 

^ n!.*''* of the inventkw can be used to isolate the corresponding DNA Irom other organisms Either cDNA 
or genon^ic DNA can be isolated. For Isolating genomic DNA. a lambda, cosmid. SAC or YAC or o^er lame Sert 
genomic l-brary from the plant of interest can be constructed using standard molecular bk,logy te<;,niques as ies^ibS 
.n detail by Sambrook et al. 1989 (Molecular Ctoning: A Laboratonr Manual. 2-> ed. Cotef^prg SaS^ La^aSJ 

fOlloi To ^ ? ''T" '^"'^"'^ •^'^"'-^ Greene PublishJ^ ^ew ST 

^ '^"'^''inant lambda clones are plated out on appropriate bacterial 

r ^f^l^'^ ^- ""^host strain. The resulting plaques are lifted from the pbles uin?^.yl«, 

m^sSTn! h''- ^'^''"f '"^'''^ denaturatlon. neutralization, and wLing treaS,ente fZ^S 

the standard protocols outlined by Ausubel et al. (1 992). The plaque lifts are hybridized to eith« radioactJ-ely Sd 
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or non-radioactively labeled SDF DNA at room temperature for about 16 hours, usually In the presence of 50% forma- 
mide and 5X SSC (sodium chtoride and sodium citrate) buffer and blocking reagents. The plaque lifts are then washed 
at 42*C with 1% Sodium Dodecyl Sulfate (SDS) and at a particular concentration of SSC. The SSC concentration used 
is dependent upon the stringency at which hybridization occurred in the biitial Southern blot analysis performed. For 
example, if a fragment hybridized under medium stringency (e.g., Tm - 20"C), then this condition is maintained or 
preferably adjusted to a less stringent condition (e.g., Tm-30"C) to wash the plaque lifts. Positive clones show detect- 
able hybridization e.g., by exposure to X-ray films or chronrKjgen formation. The positive clones are then subsequently 
isolated for purification using the same general protocol outlined above. Once the clone is purified, restriction analysis 
can be conducted to narrow the region corresponding to the gene of Interest. The restriction analysis and succeeding 
subcloning steps can be done using procedures described by, for example Sambrook et at. (1989) cited above. 
[01 1 1] The procedures outlined for the lambda library are essentially similar to those used for YAC library screening, 
except that the YAC clones are harbored in bacterial cok>nies. The YAC clones are plated out at reasonable density 
on nitrocellulose or nylon filters supported by appropriate bacterial medium in petri plates. Following the growth of the 
bacterial clones, the fitters are processed through the denaturation. neutralization, and washing steps following the 
IS procedures of Ausubel et al. 1992. The same hybridization procedures for lambda library screening are followed. 
[0112] To isolate cDNA. similar procedures using appropriately nrK>dified vectors are empbyed. For instance, the 
library can be constnicted in a lambda vector appropriate for ckxiing cDNA such as Xgtll. Alternatively, the cDNA 
library can be made in a plasmid vector. cDNA for ckxiing can be prepared by any of the methods known in the art, 
but is preferably prepared as described above. Preferably, a cDNA library will include a high proportion of full-length 
20 clones. 

B. 5. Isolating and/or Identifying Ortholoqous Genes 
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[0113] Probes and primers of the invention can be used to identify and/or isolate polynucleotides related to those in 
REF and SEQ TABLES 1 AND 2. Related polynucleotides are those that are native to other plant organisms and exhibit 
either similar sequence or encode polypeptides with similar biological activity. One specific example is an orthotogous 
gene. Orthotogous genes have the same functional activity As such, orthok>gous genes may be distinguished from 
homobgous genes. The percentage of identity is a f unctbn of evolutionary separation and, in ck>sely related species, 
the percentage of identity can be 98 to 100%. The amino acid sequence of a protein encoded by an orthotogous gene 
30 can be less than 75% identical, but tends to be at Ieast75% or at least 80% identical, more preferably at least 90%, 
most preferably at least 95% identical to the amino acid sequence of the reference protein. To find orthotogous genes, 
the probes are hybridized to nucleic acids from a species of interest under low stringency conditions, preferably one 
where sequences containing as much as 40-45% mismatches will be able to hybridize. This condition is established 
by - 40**C to T„ - 48'C (see below). Blots are then washed under conditions of increasing stringency It is preferable 
that the wash stringency be such that sequences that are 85 to 1 00% identical will hybridize. More preferably, sequences 
90 to 100% identical will hybridize and most preferably only sequences greater than 95% identical will hybridize. One 
of ordinary skill in the art will recognize that, due to degeneracy In the genetic code, amino acid sequences that are 
identical can be encoded by DNA sequences as little as 67% identical or less. Thus, it is preferable, for example, to 
make an overlapping series of shorter probes, on the order of 24 to 45 nucleotides, and indivkJually hybridize them to 
the same arrayed library to avokJ the problem of degeneracy introducing large numbers of mismatches. 
[0114] As evolutionary divergence increases, genome sequences also tend to diverge. Thus, one of skill will recog- 
nize that searches for orthotogous genes between more divergent species will require the use of lower stringency 
conditions compared to searches between closely related species. Also, degeneracy of the genetic code is more of a 
problem for searches in the genome of a species more distant evoluttonarily from the species that is the source of the 
45 SDF probe sequences. 

[0115] The SDFs of the invention can also be used as probes to search for genes that are related to the SDF within 
a species. Such related genes are typtoally considered to be members of a gene family In such a case, the sequence 
similarity will often be concentrated Into one or a few fragments of the sequence. The fragments of similar sequence 
that define the gene family typically encode a fragment of a protein or RN A that has an enzymatic or structural function. 
The percentage of identity in the amino acid sequence of the domain that defines the gene family is preferably at least 
70%, more preferably 80 to 95%. rrost preferably 85 to 99%. To search for members of a gene family within a species, 
a low stringency hibridizatton is usually performed, but this will depend upon the size, distribution and degree of se- 
quence divergence of domains that define the gene family. SDFs encompassing regulatory regtons can be used to 
identify coordinately expressed genes by using the regulatory regton sequence of the SDF as a probe. 
[0116] In the instances where the SDFs are identified as being expressed from genes that confer a particular phe- 
notype, then the SDFs can also be used as probes to assay plants of different species for those phenotypes. 
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I.e. Methods to Inhibit Gene Expression 
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[01171 The nucleic ackJ molecules of the present invention can be used to inhibit gene transcription and/br translation. 
Example of such methods include, without limitation: 

5 

Antisense Cor>structs; 
Ribozyme Constructs; 
Chimeraplast Constructs; 
Co-Suppression; 
10 Transcriptbnal Silencing; and 

Other Methods of Gene Expression. 

C.I Antisense 

IS [01 1 8] In some instances It is desirable to suppress expression of an endogenous or exogenous gene. A well-known 
instance is the FLAVOR-SAVOR* tomato, in which the gene encoding ACC synthase is inactivated by an antisense 
approach, thus delaying softening of the fruit after ripening. See for example. U.S. Patent No. 5.859.330- U S Patent 
No. 5.723,766; OeUer, etal. Science, 254:437-439(1991); and Hamilton etal. Nature, 346:284-287 (1990)! Also, timing 
of flowering can be controlled by suppression of the FLOWERING LOCUS C (FLQ; high levels of this transcript are 

20 associated with late flowering, while absence of FLO is associated with earty flowering (S.D. Michaels et al.. Plant Cell 
11:949 (1999). Also, the transition of apical meristem from production of leaves with associated shoots to flowering is 
regulated by TERMINAL FLOWERh APETALA 1 and LEAFY, Thus, when it is desired to Induce a transition from shoot 
production to flowering, it is desirable to suppress TFL1 expression (S.J. Liljegren. Plant Cell ^V:^007 (1999)). As 
another instance, arrested ovule development and female sterility result from suppression of the ethylene forming 

25 enzyme but can be reversed by application of ethylene (D. De Martinis et al.. Plant Ce// 11 :1061 (1 999)). The ability 
to manipulate female fertility of plants is useful in increasing fruit production and creating hybrids. 
[0119] (n the case of polynucleotides used to Inhibit expression of an endogenous gene, the introduced sequence 
need not be perfectly identical to a sequence of the target endogenous gene. The introduced polynucleotide sequence 
will typically be at least substantially identical to the target endogenous sequence. 

30 [0120] Some polynucleotide SDFs in REF and SEQ TABLES 1 AND 2 represent sequences that are expressed in 
corn, wheat, rice, soybean Arabkiopsis and/or other plants. Thus the Invention includes using these sequences to gen- 
erate antisense constructs to inhibit translation and/or degradation of transcripts of said SDFs. typically in a plant cell. 
[01 21] To accomplish this, a polynucleotide segment from the desired gene that can hybridize to the mRNA expressed 
from the desired gene (the antisense segment") is operably linked to a promoter such that the antisense strand of RNA 

35 will be transcribed when the construct is present in a host cell. A regulated promoter can be used in the constojct to 
control transcription of the antisense segment so that transcription occurs only under desired circumstances. 
[0122] The antisense segment to be introduced generally will be substantially identical to at least a fragment of the 
endogenous gene or genes to be repressed. The sequence, however, need not be perfectly identical to inhibit expres- 
sion. Further, the antisense product may hybridize to the unUanslated region instead of or in addition to the coding 

40 sequence of the gene. The vectors of the present invention can be designed such that the inhibitory effect applies to 
other proteins within a family of genes exhibiting homology or substantial homology to the target gene. 
[0123] For antisense suppression, the introduced antisense segment sequence also need not be full length relative 
to either the primary transcription product or the fully processed mRNA. Generally, a higher percentage of sequence 
Identity can be used to compensate for the use of a shorter sequence. Furthermore, the introduced sequence need 

4S not have the same intron or exon pattern, and homology of noncoding segments may be equally effective Normally 
a sequence of between about 30 or 40 nucleotides and the full length of the transcript can be used, though a sequence 
of at least about 100 nucleotides is preferred, a sequence of at least about 200 nucleotides is more preferred, and a 
sequence of at least about 500 nucleotides is especially preferred. 

so C.2. Ribozvmes 

[0124] It is also contemplated that gene constmcts representing ribozymes and based on the SDFs in REF AND 
SEQ TABLES 1 AND 2 are an object of the invention. Ribozymes can also be used to inhibit expressbn of genes by 
suppressing the translation of the mRNA into a polypeptide. It is possible to design ribozymes that specifically pair with 
55 virtually any target RNA and cleave the phosphodiester backbone at a specific location, thereby functionally inactivating 
the target RNA. In carrying out this cleavage, the ribozyme is not itself altered, and is thus capable of recycling and 
cleaving other molecules, making it a true enzyme. The incluskxi of ribozyme sequences within antisense RNAs confers 
RNAcleaving activity upon them, thereby increasing the activity of the constructs. 
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[0125] A number of classes of ribozymes have been Identified. One class of ribozymes is derived from a number of 
small circular RNAs, which are capable of selfcleavage and replication in plants. The RNAs replicate either alone (viroid 
RNAs) or with a helper virus (satellite RNAs). Examples include RNAs from avocado sunblotch viroid and the satellite 
RNAs from tobacco ringspot virus, lucerne transient streak virus, velvet tobacco motile virus, solanum nodtflorum mottle 
virus and subterranean clover motile virus. The design and use of target RNAspecific ribozymes Is described in Haseloff 
et ai. Nature, 334:SBS (1988). 

[0126] Like the antisense constructs above, the ribozyme sequence fragment necessary for pairing need not be 
identical to the target nucleotides to be cleaved, nor identical to the sequences in REF AND SEQ TABLES 1 AND 2. 
Ribozymes may be constructed by combining the ribozyme sequence and some fragment of the target gene whfch 
would allow recognition of the target gene mRNA by the resulting ribozyme niolecule. Generally. Ihe sequence In the 
ribozyme capable of binding to the target sequence exhibits a percentage of sequence identity with at least 80%. 
preferably with at least 85%, more preferably with at least 90% and most preferably with at least 95%, even more 
preferably, with at least 96%, 97%, 98% or 99% sequence identity to some fragment of a sequence in REF AND SEQ 
TABLES 1 AND 2 or the complement thereof. The ribozyme can be equally effective in inhibiting mRNA lranslalk)n by 
cleaving either in the untranslated or coding regkjns. Generally, a higher percentage of sequence kJentity can be used 
to compensate for the use of a shorter sequence. Furthermore, the Introduced sequence need not have the same 
intron or exon pattern, and honnoiogy of non-coding segments may be equally effective. kv 



C.3. Chlmerap lasts 

[01 27] The SDFs of the invention, such as those described by the REF and SEQ Tables, can also be used to construct 
chimeraplasts that can be ffitroduced into a cell to produce at least one specific nucleotkJe change in a sequence 
corresponding to the SDF of the invention. A chimeraplast is an oligonucleotide comprising DNA and/or RNA that 
specifically hybridizes to a target region in a rrianner whfch creates a mismatched base-pair. This mismatched base- 
pair signals the ceirs repair enzyme machinery which acts on the mismatched region resulting in the replacement, 
insertion or deletion of designated nucleotide(s). The altered sequence is then expressed by the ceirs normal cellular 
mechanisms. Chimeraplasts can be designed to repair mutant genes, modify genes, introduce site-specific mutations, 
and/or act to interrupt or alter normal gene functon (US Pat. Nos. 6.010,907 and 6,004.804; and PCT Pub No 
W099/58723 and WO99y07865). 

C.4. Sense Suporession 
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[0128] The SDFs of REF and SEQ TABLES 1 AND 2 of the present invention are also useful to modulate gene 
expression by sense suppression. Sense suppresson represents another method of gene suppression by introducing 
at least one exogenous copy or fragment of the endogenous sequence to be suppressed. 

[0129] Introduction of expression cassettes in which a nucleic acid is configured in the sense orientation with respect 
to the promoter into the chromosome of a plant or by a self -replicating virus has been shown to be an effective means 
by which to induce degradation of mRNAs of target genes. For an example of the use of this method to rmxJulate 
expression of endogenous genes see, Napoli et al.. The Plant Ce//2:279 (1990), and U.S. Patents Nos. 5.034.323, 
5.231 ,020. and 5,283.184. Inhibitk>n of expressfon may require some transcription of the introduced sequerice. 
[0130] For sense suppressbn, the introduced sequence generally will be substantially identical to the endogenous 
sequence intended to be inactivated. The minimal percentage of sequence identity will typically be greater than about 
65%, but a higher percentage of sequence identity might exert a more effective reduction in the level of normal gene 
products. Sequence identity of more than about 80% is preferred, though about 95% to absolute kJentrly would be 
most preferred. As with antisense regulatbn, the effect would likely apply to any other proteins within a similar family 
of genes exhibiting horriotogy or substantial homology to the suppressing sequence. 

C.5. Transcriptional Silencing 

[0131] The nucleic acid sequences of the invention, including the SDFs of REF and SEQ TABLES 1 AND 2. and 
fragments thereof, contain sequences that can be inserted into the genome of an organism resulting in transcriptional 
silencing. Such regulatory sequences need not be operatively linked to coding sequences to modulate transcription of 
a gene. Specifteally, a promoter sequence without any other element of a gene can be introduced into a genome to 
transcriptionally silence an endogenous gene (see, lor example. V^ucheret, H et al. (1998) The Plant Joumal 16: 
651 -659). As another example, triple helices can be formed using oligonucleotides based on sequences from REF 
AND SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto. The oligonucleotide can 
be delivered to the host cell and can bind to the promoter in the genome to form a triple helix and prevent transcription 
An oligonucleotide of interest is one that can bind to the promoter and block binding of a transcription factor to the 
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C.6. other MethnH» .» ...i.a.f, ^ ^ 

fOI 32] Yet another means of suooresein^ 

»»onrK)logous recombination. •nweof. and substantialV similar 

sequence thereto 

I D. Methods of F..n...>^.^. i^n-,^^-, 

necessary to obtain a desired phenofype S^^!{tlf'^^'^^'^*"*^^^^<^'^Pofyoenu:UBn ...r^ . 

^^^^^^ genes (J.j. Schwa« e, ai.. MU Oo«. BloL 12:4 (1992?r:L'rtle? 7^*^^^^^^ ^'^^ 
[0138J TheSDFsofth * ®^ Gen. Gene/. 261:546 

I.E. Promoters 

tissues c;orSs^n„*^^'"^"""'=^^«''^«^««««stituliv^ ^an be useful in 

l^^m W^^pJ^r^^;:^^ '^"*"'°*'--P'««-inpa.icu.arce«,ypes. 

[0141] Promoters are generalh, mod..., • " ' ^"^ "^^^ 

"•ore enhancers and/or suppressors that function 
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as binding sites for additional transcription factors that have the function of modulating the level ol transcription with 
respect to tissue specificity and of transcriptional responses to particular environmental or nutritional factors and the 
like. 

[01 42] Short DN A sequences representing binding sites for proteins can be separated from each other by inten^ening 
sequences of varying length. For exanrjple, within a particular functional module, protein binding sites may be constituted 
by regions of 5 to 60, preferably 10 to 30, more preferably 10 to 20 nucleotides. Within such binding sites, there are 
typically 2 to 6 nucleotides that specifically contact amino acids of the nucleic acid binding protein. The protein binding 
sites are usually separated from each other by 10 to several hundred nucleotides, typically by 15 to 150 nucleotides, 
often by 20 to 50 nucleotides. DNA binding sites in promoter elements often display dyad symmetry in their sequence. 
Often elements binding several different proteins, and/or a plurality of sites that bind the same protein, will be combined 
in a region of 50 to 1.000 basepairs. 

[0143] Elements that have transcription regulatory function can be isolated from their con^esponding endogenous 
gene; or the desired sequence can be synthesized, and recombined in constructs to direct expression of a coding 
region of a gene in a desired tissue-specific, temporal-specific or other desired manner of inducibllity or suppression. 
When hybridizations are performed to identify or isolate elements of a promoter by hybridization to the long sequences 
presented in REF AND SEQ TABLES 1 AND 2, conditions are adjusted to account for the above-described nature of 
promoters. For exarnple short probes, constituting the element sought, are preferably used under tow temperature 
and/or high salt conditions. When long probes, which might Include several promoter elements are used, low to medium 
stringency conditions are preferred when hybridizing to promoters across species. 

[0144] If a nucleotide sequence ol an SDR or part of the SDF. functions as a promoter or fragment of a promoter, 
then nucleotide substitutions, insertions or deletions that do not substantially affect the binding of relevant DNA binding 
proteins would be considered equivalent to the exemplified nucleotide sequence. It is envisioned that there are In- 
stances where it is desirable to decrease the binding of relevant DNA binding proteins to silence or down-regulate a 
promoter, or conversely to increase the binding of relevant DNA binding proteins to enhance or up-r egulate a promoter 
and vice versa. In such instances, polynucleotides representing changes to the nucleotide sequence of the DNA-protein 
contact region by insertion of additional nucleotides, changes to identity of relevant nucleotides, including use of chem- 
ically-modified bases, or deletion of one or more nucleotides are considered encompassed by the present Invention. 
In addition, fragments of the promoter sequences described by the REF and SEQ Tables and variants thereof can be 
fused with other promoters or fragments to facilitate transcription andtor transcription In specific type of cells or under 
specific conditions. 

[0145] Promoter function can be assayed by methods known in the art, preferably by measuring activity of a reporter 
gene operatively linked to the sequence being tested for promoter function. Examples of reporter genes include those 
encoding luciferase, green fluorescent protein, GUS. neo. cat and bar. 

I.F. UTRs and Junctions 

[0146] Polynucleotides comprising untranslated (UTR) sequences and intron/exon junctions are also within the scope 
of the invention. UTR sequences include introns and 5' or 3' untranslated regk^ns (5* UTRs or 3' UTRs). Fragments of 
the sequences shown In REF AND SEQ TABLES 1 AND 2 can comprise UTRs and intron/exon junctions. 
[0147] These fragments of SDFs. especially UTRs, can have regulatory functkxis related to, for example, translatfon 
rate and mRNA stability. Thus, these fragments of SDFs can be isolated for use as elements of gene constructs for 
regulated production of polynucleotides encoding desired polypeptides. 

[0148] Introns of genomic DNA segments might also have regulatory functions. Sometimes regulatory elements, 
especially transcription enhancer or suppressor elements, are found within introns. Also, elements related to stability 
of heteronuclear RNA and efficiency of splicing and of transport to the cytoplasm for translation can be found in intron 
elements. Thus, these segments can also find use as elements of expression vectors intended for use to transform 
plants. 

[0149] Just as with promoters UTR sequences and intron/exon junctions can vary from those shown in REF AND 
SEQ TABLES 1 AND 2. Such changes from those sequences preferably will not affect the regulatory activity of the 
UTRs or Intron/exon junction sequences on expression, transcription, or translation unless selected to do so. However, 
in some instances, down- or up-regulation of such activity may be desired to modulate traits or phenotypic or in vitro 
activity. 

I.G. Coding Sequences 

[01 SO] Isolated polynucleotides of the invention can Include coding sequences that encode polypeptides comprising 
an amino acid sequence encoded by sequences in REF AND SEQ TABLES 1 AND 2 or an amino acid sequence 
presented in REF AND SEQ TABLES 1 AND 2. 



no 



EP 1 033 405 A2 



10 



IS 



20 



25 



[0151] A nucleotide sequence ennv^t^c a r^^K 

and Ihe pm^ry transcript is subsequently p,S^^f^ sequence is tZS^ 

bonng the nacleic add. Thus, an isobted n JSte ^.^J^'"^ by a host cell (or a cell free « sJ3S^ 

Which are native to com. Zb^TJ^'^^^^'^ ^^^''^ of REF AND SEQ TABLES 1 AND i> 
H. Polypeptides and Proteins 
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The antibodies are also useful for examining the production level of proteins in various tissues, lor example in a wild- 
type plant or following genetic manipulation of a plant, by methods such as Western blotting. 
[01 63] Antibodies of the present invention, both polyclonal and monoclonal, may be prepared by conventional meth- 
ods. In general, the polypeptides of the invention are first used to immunize a suitable animal, such as a mouse, rat, 
rabbit, or goat Rabbits and goats are preferred for the preparation of polyclonal sera due to the volume of serum 
obtainable, and the availability of labeled anti-rabbit and anti-goat antibodies as detection reagents. Immunization is 
generally performed by mixing or emulsifying the protein in saline, preferably in an adjuvant such as Freund's complete 
adjuvant, and injecting the mixture or emulsion parenterally (generally subcutaneously or intramuscularly). A dose of 
50-200 ng/injection is typically sufficient. Immunization is generally boosted 2-6 weeks later with one or more injections 
of the protein in saline, preferably using Freund's incomplete adjuvant. One may alternatively generate antibodies by 
in vitro immunization using methods known In the art whk:h for the purposes of this invention is conskiered equivalent 
to in vivo immunizatton. 

[01 64] Polyclonal antisera is obtained by bleeding the immunized animal into a glass or plastk; container, incubating 
the bkxxJ at aS'^C for one hour, followed by incubating the bbod at 4**C for 2-18 hours. The serum is recovered by 
centnfugatk>n (e.g., 1 ,000xg for 10 minutes). About 20-50 ml per bleed may be obtained from rabbits. 
[0165] Monoclonal antibodies are prepared using the method of Kohler and Milsteia Natum 256: 495 (1975), or 
modification thereof. Typically, a mouse or rat is immunized as described above. However, rather than bleeding the 
animal to extract serum, the spleen {and optionally several large lymph nodes) is removed and dissociated into single 
cells. If desired, the spleen cells can be screened (after removal of nonspecifically adherent cells) by applying a cell 
suspension to a plate, or well, coated with the protein antigen. Shells producing membrane-bound immunoglobulin 
specific for the antigen bind to the plate, and are not rinsed away with the rest of the suspension. Resulting B-cells, or 
alt dissociated spleen cells, are then induced to fuse with myebma cells to form hybrktomas, and are cultured in a 
selective medium (e.g.. hypoxanthine, aminopterin, thymidine medium. HAT*). The resulting hybridomas are plated by 
limiting dilutk)n, and are assayed for the productkxi of antibodies which bind specifically to the immunizing antigen 
(and which do not bind to unrelated antigens). The selected Mab-secreting hybridomas are then cultured either in vitro 
(e.g., in tissue culture bottles or hollow fiber reactors), or in vivo (as ascites in mk:e). 

[01 66] Other methods for sustaining antibody-producing B-cell clones, such as by EBV transfonmatbn. are known. 
[0167] If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional techniques. 
Suitable labels include fluorophores, chromophores, radtoactive atoms (partfcularly and ^^\), electron-dense re- 
agents, enzymes, and iigands having specific binding partners. Enzymes are typically detected by their activity. For 
example, horseradish peroxkJase is usually detected by its ability to convert 3,3\5,5MeUamethylbenzidine (TNB) to a 
blue pigment, quantifiable with a spectrophotometer. 

A.2 In Vitro Applications of Polvpeptkies 

[0168] Some polypeptides of the inventbn will have enzymatic activities that are useful in vitro. For example, the 
soybean trypsin inhibitor (Kunrtz) family is one of the numerous families of proteinase inhibitors. It comprises plant 
proteins which have inhibitory activity against serine proteinases from the trypsin and subtilisin families, thiol protein- 
ases and aspartb proteinases. Thus, these peptides find in vitro use in protein purifk:atlon protocols and perhaps in 
therapeutic settings requiring topical appication of protease inhibitors. 

[0169] Delta-aminolevulink: acid dehydratase (EC 4.2.1.24) (ALAD) catalyzes the second step in the biosynthesis 
of heme, the condensation of two molecules of 5-aminolevulinate to form porphobilinogen and is also involved in chlo- 
rophyll biosynthesls(Kaczor et al. (1994) Plant Physiol. 1-4: 1411-7; Smith (1988) Biochem. J. 249: 423-8; Schneider 
(1 976) Z. naturf orsch. [C] 31 : 55-63). Thus, ALAD proteins can be used as catalysts in synthesis of heme derivatives. 
Enzymes of biosynthetic pathways generally can be used as catalysts for in vitro synthesis of the compounds repre- 
senting products of the pathway. 

[0170] Polypeptkies encoded by SDFs of the invention can be engineered to provide purification reagents to identify 
and purify additbnal polypeptides that bind to them. This allows one to identify proteins that function as multimers or 
elucidate signal transduction or metabolic pathways. In the case of DNA binding proteins, the polypeptide can be used 
in a similar manner to identify the DNA determinants of specific binding (S. Pierrou et al. . Anal. Biochem, 229:99 (1 995), 
S. Chusacultanachai et al.. J. Bioi. Chem. 274:23591 (1999), Q. Lin et al.. J. Bio!, Chem. 272:27274 (1997)). 

II.B . POLYPEPTIDE VARIANTS , FRAGMENTS, AND FUSIONS 

[0171] Generally, variants , fragments, or fusions of the polypeptides encoded by the maximum length sequence 
(MLS) can exhibit at least one of the activities of the klentif led domains and/or related polypeptides described in Sections 
(C) and (D) of REF TABLES 1 and 2 corresponding to the MLS of interest. 
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i^l^are'^^r ^^^^^^^ C^-.e s^..^, 

conservation of charge, polarity, hydrophobicij sbl e^l? 

sequerice can be substituted with a^etanZ^'J^^^S^ T " "^^ '^^"^ ^^^in the 

providing a hydrogen bond in an en^tT^a^fe s^t m^S^ 

are preferably made anwng the rnenWs of ftoSs to^i^ fh^ . «"hin an exemplified sequence 

(hydrophobic) amino acids Lude alan^ IcL !2teucrrvl«Tr ^J^' sample, the nonpolar 
nine. The polar neutral am^ acids ir^lude gS sll S^"!" ' "^P'^P*^^ ^ 

The positive^ charged (basic, aminoacids indSargle 'S^^S^^^^ 

acids include aspaitic acid and glutamic acid ^ •^'"^^'"^^•^"egativety charged (acidic) amino 
and^attheN.em,inalandJ?r,^^:,rrtne,^^^^^^ 

may be deleted from the polypeptide. Amino acid sSstttu,^;,^^^ !L ZTi^ °' "^"^ ^^"^""^^ 
substitutions being preferred. "•"wnuiions may also be made in the sequences; conservative 

(0174] One preferred class of variants are those thai cQmorfeflri\th«.H~_ . 

residues consenred between the encoded poJJ^traSld li^l^ c Potmp^^e andtor (2) 

s^rv^rs^r^"-'^^--^----^^^^^^ 

[01 76] Yet another class of variants includes those that lark «na «f - 

encoded polypeptides. One example is po^J^!^ ^^^Tol^J^T °' '^^""'^ 

mutations. Such a variant may Jnpris^TZ^ oS^'Z^^^^ fromg^r^e^ comprising dominant negative 
ticular domain or group of c^senred resides PoVpept.de sequence with non-consenrative changes in a par- 



II.A.(2) FRAGMENTS 



n.A.(3)FUSIONS 



St S)T?;L^e?tr ^pS^^^^^^ •'^^ -f--' PoVpeptide or variants thereof of 

MLS Of the invention^sed to sSf J^h^ ^'aI^? ^oTl"^" k °' ^ "V « 

im^ention also encompasses fusbns of m7s er^^jZeoS^' ^.'"''.««'P/««« AP2 heflces. The present 
proteins or fragments thereof Polypeptides, variants, or fragments thereof fused with related 
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(http//pfam. wustl.edu/browse.shtml). 



(AAA) AAA-protein family signature 



[01 81] A large family of ATPases has been described n tn, m l * . - 

of about 220 amino acids that contains anATp S^ 111 ^^^'"'^ ^^^^ ^^^^ ^ consented region 
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containing two AAA domains: 

Mammalian and drosophita NSF (N-ethylmaleimide-sensitive fusion protein) and the fungal homolog. SEC18. 
These proteins are involved in Intracellular transport between the endoplasmic reticulum and Golgi, as well as 
between different Golgi cisternae. 

- Manrvnalian transitional endoplasmic reticulum ATPase (previously known as p97 or VCP) which is involved in the 
transfer of membranes from the endoplasmic reticulum to the golgi apparatus. This protein forms a ring-shaped 
homooligomer composed of six subunits. The yeast homolog is CDC48 and it may play a role In spindle pole 
proliferation. 

Yeast protein PAS1 , essential lor peroxisome assembly and the related protein PAS1 from Pichia pastoris. 
Yeast protein AFG2. 

- Sulfolobus acldocaldarius protein SAV and Halobacterium salinarium cdcH which may be part of a transduction 
pathway connecting light to cell division. 

[0182] Proteins containing a single AAA domain: 

- Escherichia coli and other bacteria ftsH (or hfIB) protein. FtsH is an ATP-dependent zinc metallopeptkJase that 
seems to degrade the heat-shock sigma-32 factor. 

[01 83] It is an integral membrane protein with a large cytoplasmic C-terminal domain that contain both the AAA and 
the protease domains. 

- Yeast protein YME1 , a protein important for maintaining the »itegrity of the mitochondrial compartment. YME1 is 
also a zinc-dependent protease. 

- Yeast protein AFG3 (or YTA10). This protein also seems to contain a AAA domain followed by a zinc-dependent 
protease domain. 

[0184] Subunits from the regulatory complex of the 26S proteasome [6] which is involved in the ATP-dependent 
degradation of ubiquitinated proteins: 

a) Mammalian subunit 4 and homoiogs in other higher eukaryotes, in yeast (gene YTA5) and fission yeast (gene 
mts2). 

b) Mammalian subunit 6 fTBP7) and homoiogs in other higher eukaryotes and in yeast (gene YTA2). 

c) Mammalian subunit 7 (MSS1) and homoiogs in other higher eukaiyotes and in yeast (gene GIM5 or YTA3). 

d) Mammalian subunit 8 (P45) and homoiogs in other higher eukaryotes and in yeast (SUG1 or CIM3 or TBY1) 
and fissk)n yeast (gene letl ). 

[0185] Other probable subunits such as human TBPl which seems to influences HIV gene expression by interacting 
with the virus tat transactivator protein and yeast YTA1 and YTA6. 

- Yeast protein BCS1 , a mitochondrial protein essential for the expression of the Rieske iron-sulfur protein. 

- Yeast protein MSP1 , a protein involved in intramitochondrial sorting of proteins. 

- Yeast protein PAS8, and the corresponding proteins PASS from Pichia pastoris and PAY4 from Yarrowia lipolytica. 

- Mouse protein SKD1 and its fission yeast homolog (SpAC2G11 .06). 
Caenorhabditis elegans meiotic spindle formation protein mei-1 . 
Yeast protein SAP1. 

- Yeast protein YTA7. 

Mycobacterium leprae hypothetical protein A2126A. 

[0186] It is proposed that, in general, the AAA domains In these proteins act as ATP-dependent protein clamps [5]. 
In addiUon to the ATP-blndIng 'A' and B' motifs, which are located in the N-terminal half of this domain, there is a highly 
consen/ed region tocated in the central part of the domain which was used to develop a signature pattern. 
Consensus pattern: lLIVMT]-x-[UVMT]-[LIVMF]-x-|GATMC]4ST]-[NS]-x(4)-[LIVMJ-D-x-A-(UFA]-x-R 

[1] Froehlich K.-U.. Fries H.W.. Ruediger M.. Erdmann R., Botstern D., Mecke D. J. Cell Biol. 114:443-453(1991). 
[2] Erdmann R., Wiebel RF. Flessau A. Rytka J.. Beyer A., Froehlich K.-U., Kunau W.-K Cell 64:499-510(1991). 
[3] Peters J.-M., Walsh M.J.. Franke W.W. EMBO J. 9:1757-1767(1990). 

[4] Kunau W.-H.. Beyer A, Goette K.. Marzioch M.. Saidowsky J., SkaleU-Rorowski A.. Wiebel FF Biochimie 75: 
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209-224(1993). 

[5] Confalonieri R. Duguot M. BioEssays 1 7:639-650(1 995).[ 6) Hift W., Wolf D.H. Trends Biochem. Scl. 21:96-102 
(1996). 

2. ABC Membrane (ABC transporter transmembrane region). This <amily represents a unit of six transmembrane 
helices. Many members of the ABC transporter family (ABC tran)have two such regions. See also descriptions ol 
ABC Tra n. bebw, and ABC2 membrane, above. 

3. (ABC Tran) 

ABC transporters family signature 

[01 87] On the basis of sequence similarities a family of related ATP-bindingprolerns has been characterized 11 to 5) 
These proteins are associated with avariety of distinct biological processes in both prokaryotes and eukaryotes but a 
majority of them are involved in active transport of small hydrophilic molecules across the cytoplasmic membtaiie All 
these proteins share a conserved domain of some two hundred amino acid residues, which includes an ATP-bindina 
site. These proteins are collectively known as ABC transporters. Proteins known to belong to this family are list^ 
below (references are only provided for recently detemrtined sequences).ln prokaryotes: - Active transport systems 
components: alkylphosphonato uptake(phnC/phnK/phnL): arabinose (araG); arginine (artP); dipeptide (dciAD dpoD/ 
dppF): ferric enterotectin (fepC); ferrichrome (fhuC); galactoside (mgIA); glutamlne (gInQ); gVcerol^hosphaie (uo- 
pC); glycine betaine/L-proline (proV); glutamate/aspatate (gItL); histidine (hisP); iron(lll) (sfuC), iron(lll) dkatrate ffecEV 
lactose (lacK); leucineA.soleucinGA,aline {braF/braG;livF/livG); maltose (malK); molybdenum (modC); nickel (nikD/ 
nikE); oligopeptide (amiE/amiF:oppD/oppF); peptkJe (sapO/sapF); phosphate (pstB); putrescine (polGV ribose frbsAV 
spermidine/putrescine (potA); sulfate (cysA); vitamin B12 (btuD). - Hemolysinrteukotoxin export proteins hlyS cvaB 
and IklB. - ColKin V export protein cvaB. - Lactococcin export protein IcnC [6]. - Lantibtotic transport proteins nisT 
(nrain) and s,»T (subtilm). - Extracellular proteases B and C export protein prtD. - Alkaline protease secretion protein 
aprD. - Beta-(1.2)-glucan export proteins chvA and ndvA. - Haemophilus influenzae capsule-polysaccharide export 
protein bexA. - Cytochrome c biogenesis proteins ccmA (also known as cycV and helA). - Polysialic acM transoort 
protein kpsT - Cell divisk>n associated ttsE protein (function unknown). - Copper processing protein nosF from Pseu- 
domonas stutzen. - Nodulatton protein nodi from Rhizobium (functton unknown). - Escherichia coli proteins cydC and 
cydD. - Subunit A of the ABC exciston nuclease (gene uvrA). - Erythromycin resistance protein from Staphylococcus 
epidennidis (gene msrA). - Tylosin resistance protein from Streptomyces f radiae (gene tIrC) (7). - Heterocyst difleren- 
TT i^!"^ Anabaena PCC 71 20. - Protein P29 from Mycoplasma hyorhinis. a probable component 

of a high affinrty transport system. - yhbG. a putative protein whose gene is linked with ntrA in many bacteria s^ as 
Eschenchia coli, Klebsiella pneumoniae, Pseudomonas putkJa, Rhizobium melitoti and Thkjbacillus ferrooxktens - 
Eschericha col. and related bacteria hypothetical proteins yabJ. yadG. yagC. ybbA. ycjW. yddA. yehX, yejF. yheS 
yhiG. yhiH. y^^W yjjK. yoji. yrbF and ytfRIn eukaryotes: - The multkJrug transporters (Mdr) (P-glycoprotein) a lamilv 
of closely related proteins which extrude a wkie variety of drugs out of the cell (for a review see [81) - Cystic fibrosis 
transmembrane conductance regulator (CFTR). which is most probably involved in the transport of chtorkle k>ns - 
Antigen peptide transporters 1 (TAP1. PSF1. RING4. HAM-1. mtp1)and2(TAP2. PSF2, RIN611. HAM-2. mtp2) whfch 
are involved in the transport of antigens from the cytoplasm to a membrane4>ound compartment for associatton with 
MHC dass I molecules. - 70 Kd peroxisomal membrane protein (PMP70). - ALDR a peroxisomal protein involved in 
X-linkedadrenoleukodystrophy {QJ. - Sulfonylurea receptor [1 0). a putative subunit of the B<ell ATP-sensitive potassium 
channe . - Drosophila proteins white (w) and brown (bw), wh«h are involved in the import of ommatidium screening 
pigments. - Fungal etongation lactor 3 (EF-3). - Yeast STE6 which is responsible for the export of the a^actor pherom- 
one - Yeast mrtochondrial transporter ATM1. - Yeast MDL1 and MDL2. - Yeast SNQ2. - Yeast sporidesmin resistance 

in^rJ^'".! • ^'^^ ^"^^ "^"^^ P'^'^'" hmtl. TlZ protein is probably 

invoh^ed in the transport of metaHx)und phytochelatins. - Fission yeast brefeldin A resistance protein (gene bfrl or 

^^^L'r. ir?'^^' lefrtomycin B resistance protein (gene pmdl). - mbpX. a hypothetical chloroplast protein from 
hrrr P'®^'^*-^*^*^ P^^'^'" «a9B from slime mokJ. This protein consists of two domains: a N-terminal subtilase 
catalytic domair, and a C-terminal ABC transporter domain. As a signature pattern for this class of proteins, a conserved 
region which is located between the "A" and the B' motifs of the ATP-binding site was used 

f^l^Z. . P^"Sf"' ['-'VMFYCHSAJ-ISAPGLVFYKQHJ-G4DENQMWHKRQASPCUMFWHKRNQSTAVMI- 

(KRACLVMMLIVMFYPANHPHYHLIVMFWI- ISAGCLIVPMFYWHPl-lKRHPJ-fUVMFYWSTAJ A^bTn J^^^- 

^^^"^^^^ T'^- "^"^ *^ "^'^ y^j''- ^l^^- P"^1 EF-3. In some of those prote^s 

!l *™ °' '""^ "^^^ P^°»^"^ '>«'°"9'"9 »° «his family also contain 

one or two copies of the ATP-binding motifs 'A' and B'. "-iwui 
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( 1] Higgrns C.F., Hyde S.C.. Mimmack MM.. Gileadi U.. Gill D.R, Gallagher MP. J. Bioenerg Biomembr 22' 
571-592(1990). 

[ 2) Higgins C.F., Gallagher M.P., Mimmack M.M.. Pearce S.R. BioEssays 8:111-116(1988). 

[ 3] Higgins C.F., Hiles I.D.. Salmond G.P.G.. Gill D.R., Dpwnle J.A.. Evans I.J., Holland I.E., Gray L. Buckels S. 

D.. Bell A.W., Hermodson M.A. Nature 323:448-450(1986). 

[ 4] Dooimie RF., Johnson M.S., Husain I., van Houten B.. Thomas D.C.. Sancar A. Nature 323:451-453(1986). 
[ 5] Blight MA. Holland LB. MoL Microbiol. 4:873-880(1990). 

{ 6) Stoddard G.W., PetzelJ.R, van Belkum M.J., Kok J.. McKay LL Appl. Environ. Mk:robbl. 58:1952-1961(1992). 
[ 7] Rosteck RR Jr., Reynolds P.A.. Hershberger G.L Gene 102:27-32(1991). 
[8] Gottesman M.M., Pastan 1. J. Biol. Chem. 263:12163-12166(1988). 
[ 9] Valle D., Gaertner J. Nature 361:682-683(1993). 

(10J Aguilar-Bryan L, Nichols C.G., Wechsler S.W., Clement J.P. IV, Boyd A.E. III. Gonzalez G.. Herrera-Sosa H., 
Nguy K.. Bryan J.. Nelson D.A. Science 268:423-426(1995). 

4. (ACBP) 

Acyl-CoA-bir)ding protein signature 

[0189] Acyl-CoA-binding protein (ACBP) is a small (10 Kd) protein that binds medium- and tong-chain acyl-CoA 
esters with very high affinity and may functwn as an intracellular carrier of acyl-CoA esters [1 J. ACBP is also known 
as diazepam binding inhibitor (DBI) or endozepine (EP) because of its ability to displace diazepam from the benzodi- 
azepine (B2D) recogniton site k)cated on the GABA type A receptor. It is therefore posstole that this protein also acts 
as a neuropeptide to modulate the actbn of the GABA receptor [2].ACBP is a highly consen/ed protein of about 90 
residues that has been so far found in vertebrates, insects and yeast. ACBP is also related to the N-temiinal section 
of a probable transmembrane protein of unknown f unctbn whichhas been found in mammals. As a signature pattern, 
the regwn that corresponds to residues 19 to 37 in mammalian ACBP was selected. 
Consensus pattern: P-[STA]-x-(DEN]-x-{UVMF]-x(2)-[LIVMFYhY-[GSTAhx-(FY]-K-Q-[STA](2)-^ 

[ 1J RoseTM.. Schultz E.R, TodaroG.J. Proc. Natl. Acad Sci. U.S.A. 89:11287-11291(1992). 
1 21 Costa E., Guidotti A. Life Sci. 49:325-344(1991). 

5. (AIRS) 

AIR synthase related proteins 

[0190] This family includes Hydrogen expression/formation protein HypE, AIR synthases, FGAM synthase and se- 
lenide, water dikinase. 

6. (AMP-binding) 

Putative AMP-binding domain signature 

[01 91] It has been shown [1 to 5) that a number of prokaryotk: and eukaryotk: enzymes whfch all probably act via an 
ATP-dependent covalent binding of AMP to their substrate, share a region of sequence similarity. These enzymes are: 
- Insects luciferase (luciferin 4-monooxygenase). Luciferase produces light by catalyzing the oxkJation of luciferin in 
presence of ATP and molecular oxygen. - Alpha-aminoadipate reductase from yeast (gene LYS2). This enzyme cata- 
lyzes the activation of alpha-aminoadipate by ATP-dependent adenylation and the reduction of activated alpha-ami- 
noadipate by N ADPH. - Acetate-CoA ligase (acetyl-CoA synthetase), an enzyme that catalyzes the f ormatkx) of acetyl- 
CoA from acetate and CoA. - Long-chain-latty-acid-CoA ligase. an enzyme that activates long-chain fatly acids for 
both the synthesis of cellular lipids and their degradation via beta-oxidation. -4-coumarate-CoA ligase (4CL), a plant 
enzyme that catalyzes the fomnation of 4-coumarate-CoA f rom 4-coumarate and coenzyme A; the branchpoint reactions 
between general phenylpropanoid metabolism and pathways leading to various specific end products. - Osuccinyl- 
benzoic acid-CoA ligase (OSB-CoA synthetase) (gene menE) [6], a bacterial enzyme Involved in the biosynthesis of 
menaqulnone (vitamin K2). - 4-Chloroben2oate-CoA ligase (EC 6.2.1.-) (4-CBA-CoA ligase) (7J, a Pseudomonas 
enzyme involved in the degradation of 4-CBA. - Indoleacetate-lysine ligase (lAA-lysine synthetase) [8]. an enzyme 
from Pseudomonas syringae that converts indoleacetate to lAA-lysine. - Bile acid-CoA ligase (gene baiB) from Eubac- 
terrum strain VP1 12708 [4]. This enzyme catalyzes the ATP-dependent formation of a variety of C-24 bile ackj-CoA.- 
Crotonobetaine/carnitine-CoA ligase (EC 6.3.2.-) from Escherichia coli (gene caiC). - L-(alpha-aminoadipyl)-L-cystei- 
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amino acids seem to be activated by adeiStfon^l ^Ii ''"^ "^^'""^"t acids 

^ J.S o, about 1000 am^ acids' - G^Wh S s^C^TrZ""^^^^ t""^ ""^-^^^ 
^es the first step h the biosynthesis o« the cycfc anS^^S Si^'""^ "^^^ cata- 
labnm - Tyrocidine synthetase I (gene tycA) Vom BadlS^« ^ ATP-depsndent racemization of pheny- 
catalyzedby grsA-Gramicidin S s^meta^ I Ze^S carried out by tycA is identical toZ 

that activates and polymerizes pS^ne. >«,^e/^^^^ 

ba«.n synthetase components E (gene entE) and Ffe^^tSSct^^ '"'"'^ ' E'«e'<^ 

■n the ATP^ependent activation of respective^ ^S^^^c^J^'^- "^''^ '"^^ i^^ed 
b.osynthes«. - Cyclic peptide antibiotic surfactin syo^^^^ff l ^ 'T* ''"""S enterobactin (enterochelin) 

contains three related domains «hile subunit 3 r^^oSJira s^^^^^^^ 

Cochliobolus carbonum. This enzyme activa^LCr Tn^n^^^^^^^ HC-tacin symhetase (gene HTS1) , J 

oxodecano« acid) that make up HC-toxin. a cyclic te JaZST K^sfJ!^^' ^''t' ^ 2-amino-9, 10-epoxi^ 
some proteins, whose exact function is not yet toi^vT bCSS«T '^"^'^ domains.There are also 

proteins are: - ORA (octapeptide-repeat a-i^ij^rpS^^'^^^IT ^'^ ^MP-binding enzymes. Th^S 

wh«h Shows a high degree of similarity with LV>^^. ^^^r'"'" "H"^ is not known but 

o be a transcriptional activator whch modubtes the ang^Sn fa^^^i? an9"'"arum protein. AngR is thought 
er operon. But it is beBeved (91 that angR is not a DNaI^^o or^rt^ "r**'"^"^ biosynthesis gene clus- 
thes« Of anguibactin. This coocluston is based on three f^S tfe?!^- ^ '"''^^ the biosyn- 

angR (1048 reskJues). which is far bigger than anyZlS^Z^ ^'^^'^m domain; the size^f 
acyl thtoesterase immedately downstream Ci3S'"^' ""^ '^^ ^''^^ « probable S- 

ug.nosa. - Escherichia coli hypothetfcal protein ydiD visfS^S T ? ■ " Pseudomonas aer- 

YBR222C. - Yeast hypothetkal protein VERuHi th^e oSf^^ 

c.ne. serine, and threonine wh^h is 16tk^ t;^^Ze^Z^T ' ^'^^ '^'"'^ '^S*^ ^^'V ''chin gV 

S ^ ^ - ■^"""•^ EMBO J. 9:2743-2750(1990) 

3 Schroeder J. Nuclefc Ackte Res. 17 460-460(1989) ^" 

( 8) FarreU D.H.. MikeseH P Actis LA rrr^^TTu ^ 

owi r-., Mciis L.A., Crosa J.H. Gene 86:45-51 (1 990). 



7. AP2 domain 



[0193] This 60 amino acid residue domain can bind to hma n i tu- ^ 

are suggested to be related to pyrldoxa^S^^DL^l^^^^^ ^" ^P^^^^^ Members of this famifv 

are also described in MuJ^ltal Z^ 

09/026,039. ^«"^'"9 U.S. Patent applications 08/700.^52. 08/879,827. oSSSJJS 

fpIS^'"^'^'^.^' ^'^"^ 1995;7:173-182 

[2J Weigel D; Plant Cell 1 995;7-388-389 

[3] Mushegian AR. Koonin EV; Genetics 1996- 144-817-828 



8. ARID 



[Omj The ARID domain rs an AT-RichlnterarrinnH^«.. u 

nucleases and po^merases. '"'^"'"^ ^'^""9 ^'"''^^"^^l homology to DNA replbation and repair 

[1]HerrscherRF, Kaplan MH Lels7ni n^^ o u 

.2] Yuan VC. WhHson RH. UuQ. '-^^JTcS:::^Z;^Z'i^'tZ!:SstZ' 
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9. (ATP synt) 



ATP synthase gamma subunit signature 



[01 95J ATP synthase (proton-translocating ATPase) (EC 3.6.1.34) [1 .2] is a componentof the cytoplasmic membrane 
of eubacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATPase complex 
is composed of an oligomeric transmembrane sector, called CF(0). and a catalytic core, called coupling factor CF(1). 
The former acts as a proton channel; the latter is composed of five subunits. alpha, beta, gamma, delta and epsilon. 
Subunit gamma is believed to be important in regulating ATPase activity and the flow of protons through the CF(0) 
complex. The best consen/ed region of the gamma subunit [3] is its C-terminus which seems to be essential for as- 
sembly and catalysis. As a signature pattern to detect ATPase gamma subunits, a14 residue conserved segment where 
the last amino acid is found one to three residues from the C-terminal extremity was used. 

[01 96] Consensus pattern: [I VJ-T-x-E -x(2)-IDE).x(3)-G.A-x-[SAKRJ- Note: Pea chloroplast gamma and two Bacillus 
species gamma subunits are not detected by this motif. 

[ 1J Futai M.. Noumi T. Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
( 2) Senior A.E. Physfol. Rev. 68:177-231(1988). 

{ 3] Miki J.. Maeda M.. Mukohata Y, Futai M. FEBS Lett. 232:221-226(1988). 



20 10. (ATP Synt A) 



Synthase a subunit signature 



[01971 ATP synthase (proton-translocating ATPase) (EC 3.6.1.34) [1 .2] is a component of the cytoplasmic membrane 
of eubacteria, the inner membrane of mitochondria.and the thylakoid membrane of chloroplasts. The ATPase complex 
is composed of an oligomeric transmembrane sector, called CF(0), which acts as a proton channel, and a catalytk: 
core, termed coupling factor CF(1 )."me CF(0) a subunit, also called protein 6, is a key component of the proton channel; 
it may play a direct role in translocating protons across the membrane. It is a highly hydrophobic protein that has beeri 
predicted to contain 8 transmembrane regions [SJ.Sequence comparison of a subunits from all available sou rces reveals 
very few consented regfons. The best consen/ed region is located in what is predicted to be the fifth transmembrane 
domain. This region contains three perfectly consented residues: an arginine, a leucine and an asparagine. Mutagen- 
esis experiments of ATPase activity. This regbn was selected as a signature pattern. 

Consensus pattern: ISTAGN]-x-{STAGl-[U VMFJ-R-L-x-lSAGV].N-[LIVMTl [R is important for proton translocatkjn] 

1 1] Futai M.. Noumi T, Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
[ 2) Senior A.E. Physiol. Rev. 68:177-231(1988). 

[ 31 Lewis M.L, Chang J.A., Simoni R.D. J. Biol. Chem. 265:10541-10550(1990). 
I 4] Cain B.D., Simoni R.D. J. Biol. Chem. 264:3292-3300(1989). 



40 11. ATP synthase B 



[0198] Part of the CF(0) (base unit) of the ATP synthase. The base unit is thought to translocate protons through 
membrane (inner membrane in mitochondria, thylakoid membrane in plants, cytoplasmic membrane in bacteria). The 
B subunits are thought to interact with the stalk of the CF(1) subunits. 

12. (ATP syntC) 



ATP synthase c subunit signature 



so 



ss 



[01 99] ATP synthase (proton-translocating ATPase) [1 ,2] is a component of the cytoplasms membrane of eubacteria. 
the inner membrane of mitochondria.and the thylakoid membrane of chtoroplasls. The ATPase complex is composed 
of an oligomeric transmembrane sector, called CF(0), which acts as a proton channel, and a catalytic core, termed 
coupling factor CF(1 ).The CF(0) c subunit (also called protein 9. proteolipid. or subunit ill) |3,4]is a highly hydrophobic 
protein of about 8 Kd which has been implicated in the proton-conducting activity of ATPase. Structurally subunit c 
consist of two long terminal hydrophobic regions, which probably span the membrane, and a central hydrophilk: region. 
N.N'-dicyclohexylcarbodiimide (DCCD) can bind covalently to subunit c and thereby abolish the ATPase activity. DCCD 
binds to a specific glutamate or aspartate residue which is located in the middle ofthe second hydrophobic region near 
the C-terminus of the protein. A signature pattern whfch includes the DCCD-binding residue was derived. 
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10200, ^---Pa«ern:,GSTA^B-I^Q^p.oOHU^^F^^^^^^^ 

! !l ^^^"u o^' '^'P^y"'' TA. Ponomarenko S.V. Biokhimiia 56 406^1 9(1 99i ^ 
1 4J Reopon H.. Perasso R. Adoutte A.. Quetier F. J. Mol. Evol. 34:5^M2X 

13. (ATPsyntOE) 



ATP synthase, Della/Epsilon chain 



alent to the aigomycin sensitive subunit loS^^^T ^""^"^^ ^"^ <D) ^^b""" ^ ^eequiv- 



alent to the aigomycin sensitive subunit (OSCP) in metozoans. 
14. (ATP synt ab) 

ATP synthase alpha and beta subunits signature 



of an ollgomeric transmembrane sec^tor caJed CRo? ^^^^■^^'^T J^ ''^^^ *^ «>^P<^ed 

actsasa proton Channel: latter Is comSSo?fr;u^„te J^"H^^^^^ """^ ^^t^)- 'o-^er 

of subunits alpha and beta are related ^nJSh wTtei^ a n Jd^SSi^^ Qamma. delta and epsilon. The sequences 
catalytic activity, while the alpha chain is a^S^rsub " hT^"^ "tif "'^'^ ^ ^^P- beta chain has 
acidifying a variety of intracer^r comJ^s^Ltry" ^erUKeT^p^S ^.""^^^ '^'^""^ 
of a transmembrane and a catalytic sector The sBauln^Z !! ? L'ke F-ATPases. they are oDgomeric complexes 

tothatofF-ATPasebetasubunrt^wSiteaeilSsSrfrSm^ 
^.A-chaebacterialmembraneissociTtSASrarL^i;^ 

ATPases beta chain and the beta r=h,i„ « r^LZZ.T.X^.l^^^. " '"'^ s^untts.The alpha chain is related to F- 
beta subunits is found [5] in some b^erL a^L iiJS^iT^^f f 1'^ '"^'"^ ^^"^^ «° '^"^TPase 

Without signal peptide cleavage. This p Setif fe S^'T. « ^Pf"^'«^ P^o'ein export pathway that proceeds 

flexneri,HrpB6inXanthomonlscampeStrL^Z4^^^^^ '" Shigella 

a segment of ten amino^id residues containinVtvlrlnr'tr P^^^-^o detect these ATPase subunits. 

first serine seems to be important for ca'ta jS in ?he^^^^^ 

impairment. ^ ^^'^^^^ ^ 'east - as its mutagenesis causes catalytic 

[0203, consensus pattern: P-rSAPHUVJ,DNH,-x(3)-S-x-S JThe firs. S is a putative act.e site rescue, 

1 1] Futai M., Noumi T. I^eda M. Annu. Rev. Biochem 58 11l-l36M9aq^ 
1 2] Senior A.E. Physiol Rev. 68:177-231(1988) 136(1989). 
1 3J Nelson N. J. Bioenerg. Biomembr. 21:553-571(1989) 

15, (ATP syntab C) 

ATP synthase ab C terminal. 
[0204J Nunnber of members" 1 90 

SoSSti^ IsS^L^S ^^'"'"^^ " ' ' ^^-'"'^^ °' ^-ATPase from bovine heart mito 

16. (A deaminase) 

Adenosine and AMP deaminase signature 

t020S, Adenosine deam.ase catalyzes the hydro,.. .eam.at^ otadenosine ^to ^.e. AMP deam^ cat- 
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alyzes the hydrolytic deamination of AMP into IMP It has been shown [1 J that these two types of enzymes share three 
regions of sequence similarities; these regions are centered on residues which are proposed to play an important role 
in the catalytic mechanism of these two enzymes. One of these regions, containing two conserved aspartic acid residues 
that are potential active site residues was selected. 

Consensus pattern: (SAHLIVMJ-[NGSHSTA]-D-D-P (The two D's are putative active site residues) 
[1] Chang Z,. hJygaard P. Chinault A.C.. Kellems R.E. Biochemistry 30:2273-2280(1991). 

17. (Acetyltransf) 
Acetyttransferase (GNAT) family. 

[0206] This family contains proteins with N*acetyltransferase functions. 
[1] Neuwald AF. Landsman D; Trends Biochem Sci 1997;22:154-155. 

18. (Aconitase C) 
Aconitase family signature 

[0207] Aconitase (aconitate hydratase) (EC 4.2.1 .3) [1 ) is the enzyme from the tricarlx>xylic acid cycle that catalyzes 
the reversible isomerization of citrate and isocitrate. Cis-aconitate ts formed as an intermediary product during the 
course of the reaction. In eukaryoles two isozymes of aconitase are known to exist one found in the mitochondrial 
matrix and the other found in the cytoplasm. Aconitase, in its active form, contains a 4Fe-4S iron-sulfur cluster; three 
cysteine residues have been shown to be ligands of the 4Fe-4S cluster It has been shown that the aconitase family 
also contains the followingproteins: - Iron-responsive element binding protein (IRE-BP). IRE-BP is a cytosolic protein 
that binds to iron-responsive elements (IREs). IREs are stem-loop structures found in the 5'UTR of ferritin, and delta 
aminolevulinic acid synthase mRNAs. and in the 3'UTR of transferrin receptor mRIMA. IRE-BP also express aconitase 
activity. - 3-isopropylmalate dehydratase (EC £2^33) (isopropylmalate isomerase). the enzyme that catalyzes the 
second step in the biosynthesis of leucine. - Homoaconitase (EC 4.2.1. 36> (homoaconitate hydratase), an enzyme that 
participates in the alpha-aminoadipate pathway of lysine biosynthesis and that converts cis-homoaconitate into ho- 
moisocitric acid. - Esherichia coli protein ybhJ.As a signature for proteins from the aconitase family, two conserved 
regions that contain the three cysteine ligands of the 4Fe-4Scluster were selected. 

Consensus pattern: [UVMhx(2)-[GSACIVM]-x-[LIV]-[GTIV].ISTPl-C-x(0,1)-T-N-[GSTANIhx(4)-(LIVMA] [C binds the 
iron-sulfur center] 

Consensus pattern: G-x(2)-[LIVWPQJ-x(3)-[GAG]-C-[GSTAM]-[UMPTA]-C-[LIMV]-[GA] [The two C's bind the iron-sul- 
fur center] 

[ 1] Gruer M.J.. Artymiuk P. J., Guest J.R. Trends Bkxhem. Sci. 22:3-6(1997). 

1 9. (Acyl-CoA dh) 

Acyl-CoA dehydrogenases signatures 

[0208] Acyl-CoA dehydrogenases [1.2,3] are enzymes that catalyze the alpha, beta-dehydrogenatbn of acyl-CoA 
esters and transfer electrons to ETF. the electron transfer protein. Acyl-CoA dehydrogenases are FAD flavoproteins. 
This family currently includes: - Five eukaryotic isozymes that catalyze the first step of the beta-oxidation cycles for 
fatty acids with vartous chain lengths. These are short (SCAD) (EC 1.3.99.2). medium (MCAD) (EC 1.3.99.3). tong 
(LCAD) (EC 1.3.99.13V very-long (VLCAD) and short/branched (SBCAD) dnam acyl-CoA dehydrogenases. These 
enzymes are kx:ated in the mitochondrion. They are all homotetrameric proteins of about 400 amino ackJ residues 
except VLCAD which is a dimer and which contains, in its mature form, about 600 reskJues. - Glutaryl-CoA dehydro- 
genase (EC 1.3.99.7) (GCDH), which is involved in the catabolism of lysine, hydroxylysine and tryptophan. - Isovaleryl- 
CoA dehydrogenase (EC 1.3.99.10) (IVD). involved in the catabolism of leucine. - Acyl-coA dehydrogenases acsA and 
mmgC from Bacillus subtilis. - Butyryl-CoA dehydrogenase (EC 1.3.99.2) from Clostridium acetobutylfcum. - Es- 
cherichia coli protein calA (4]. - Escherichia coli protein aidB. Two conserved regions were selected as signature pat- 
terns. The first is kx:ated in the center of these enzymes, the second in the C-terminal sectbn. 
Consensus pattern: [GAC]-[UVMHST]-E-x(2)-[GSAN]-G'[ST]-D-x(2)-[GSA] 
Consensus pattern: lQDE)-x(2)-G-(GS)-x-G-[LIVMFY)-x(2)-(DEN)-x(4)-[KRl-x(3)-(DENl 

[ 1] Tanaka K.. Ikeda. Matsubara Y. Hyman D.B. Enzyme 38:91-107(1987). 

[ 2) Matsubara Y.. Indo Y. Nailo E., Ozasa H.. Glassberg R.. Vockley J., Ikeda Y. Kraus J., Tanaka K. J. Biol. 
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Chem. 264:16321-16331(1989). 

[ 3] Aoyama T. Ueno L, Kamijo T. Hashimoto T J. BtoL Chem. 269:19088-19094(1994) 

1 4J Erchler K.. Bourgis F.. Buchet A.. Kleber K-R. Mandianc«ertherot M-A. Mol Microbiol 13:775.786(1994). 

20. (Acyl Uansf ) 

Acyi transferase domain 

[0209] Number of members: 1 61 

11 J Serre U Verbree EC. Dauter 2. Stuilje AR. Derewenda 2S; Medline: 95286570 The Escherichia coli malonvl^A- 

21. Acylphosphatase signatures 

^IS'^u^^ £^f^ '^'^^ °' acylphosphate cart)oxyl-phosphate 

bonds such as cajbamyl phosphate, succinylphosphate. 1.3<Jiphosphoglycerate, etc The physiolS Tole^tJfe 
enzymes no. yet dear. Acylphosphatase Is a small p«,tein of around I^Linoacid rescues 
«ozymes One seems to be specific to muscular tissues, the other, called 'organ^omnK^ fyi2 fe fourS^^ 
different trssues. While acylphosphatase have been so far onhr charac»eri7«H ^^IZT 

same actmty.Th^e proteins are: - Escherichia coli hypothetical protein yccX - Bacillus subtllis SSo^^ti^ 
yfIL - Arcf^eoglobus fulgidus hypothetical protein AF0818. Two ca,serv^ regions we^e seESsSurlC 
terns. The first is located in the N-terminal section, while the second is found in the cent^L t^rt n«h^« . 
Consensus pattern: rLIVhx-G-x-VO-G-V-x-[FMJ-R ^ P'***^" ^"""ence. 

Consensus pattern: G-[FYWHAVCHKRCaAM]-N-x(3>-G-x-V-x(5)-G 

1 1) Stefani M., Ramponi G. Ufa Chem. Rep. 12571-301(1995). 

1 2J Stefani M.. Taddei N.. Ramponi G. Coll. IMoL Life Sci. 53:141-151(1997). 

22. (Adap comp sub) 

Clathrin adaptor complexes medium chain signatures. 

[ 1J Pearse B.M., Robinson M.S. Annu. Rev. Cell Biol. 6:15M71(1990) 
[ 2] Lee J.. Jongeward G.D.. Sternberg RW. Genes Dev 8 60-73(1994) 

LeTs^S^ST ' """^ J. Biochem. 202: 
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23. (Adenylsucc synt) 
Adenylosuccinate synthetase signatures 

[0212] Adenylosuccinate synthetase (EC 6.3.4.4 ) (1] plays an Important role In purinebiosynthesis, by catalyzing the 
GTP-dependent conversion of IMP and aspartic acid to AMP. Adenylosuccinate synthetase has been characterized 
from various sources ranging from Escherichia coll (gene purA) to vertebrate tissues. Invertebrates, two isozymes are 
present - one involved in purine biosynthesis and the other in the purine nucleotide cycle. Two conserved regions were 
selected as signature patterns. The first one is a perfectly consen/ed octapeptide located in the N-terminal section and 
which is involved in GTP-binding [2]. The second one includes a lysine residue known [2] to be essential for the enzyme's 
activity. 

Consensus pattern: O-W-G-D-E-G-K-G 

Consensus pattem: G-l-lGR]-P-x-Y-x(2)-K-x(2)-R [K is the active site residue] 

[ 1] Wiesmueller L, Wittbrodt J.. Noegel A.A., Schleicher M. J. Biol. Chem. 266:2480-2485(1991). 

[ 2J Silva M.M., Poland B.W., Hoffman C.R.. Fromm H.J.. Honzatko R.B. J. Mol. Bid. 254:431-446(1995). 

{ 3J Bouyoub A.. Baibier G.. Forterre P. Labedan B. 2.3.CQ:2-'J. Mol. Biol. 261:144-154(1996) . 

24. (AdoHcyase) 

S-adenosyl-L-homocysteine hydrolase sigrtatures 

[021 3] S-adenosyl-L-homocysteine hydrolase (EC 3.3.1.1 ) (AdoHcyase) is an enzyme of the actrvated methyl cycle, 
responsible for the reversible hydratation off S-adenosyl-L-honrxx:ysteine into adenosine and homocysteine. AdoHc- 
yase is anubiqurtous enzyme which binds and requires NAD+ as a cofactor. AdoHcyase is a highly conserved protein 
[1 ] of about 430 to 470 amino acids. Two highly conserved regions were selected as signature patterns. The first pattern 
is kxated in the N-terminal sectbn; the second is derived from aglyclne-rich region in the central part of AdoHcyase; 
a regton thought to be involved in NAObinding. 

Consensus pattem: tGSAHCS}-N-x-[FYLMhS-[ST]-(QA]-{DEN]-x-[AV]-lAT|-[ADl-[ACl-[LIVMCG) 
Consensus pattem: (GA]-[KS)-x(3)-[LIV)-x-G-[FY)-G-x-[VC]-G-{KRL]-G-x-[ASCl 

[ 1J Sganga M.W., Aksamit R.R, Cantoni G.L., Bauer C.E. Proc. Natl. Acad. Sci. U.S.A. 89:6328-6332(1992). 

25. AhpC/TSA family 

[0214] This family contains proteins related to alkyi hydroperoxide reductaseComment: (AhpC) and thk)l specific 
antioxidant (TSA). 

[1] Chae HZ, Robison K, Poole LB, Church G, Storz G, Rhee SG, Proc Natl Acad Sci U S A 1994;91:7017-7021 

26. (Aldose epim) 

[0215] Aldose 1 -epimerase putative active site Aldose 1 -epimerase (EC 5. 1 .3.3) (mutarotase) is the enzyme respon- 
sible for the anomerk: interconverslon of D-glucose and other aWoses between their alpha- and beta-forms. The se- 
quence of mutarotase from two bacteria, Acinetobacter calcoacelicus and Streptococcus thermophilus is available (1 J. 
It has also been shown that, on the basis of extensive sequence similarities, a mutarotase donnain seem to be present 
in the C-terminal half of the fungal GAL10 protein which encodes, in the N-terminal part, for UOP-glucose 4-epimeras6. 
The best consented region in the sequence of mutarotase is centered around a conserved histidine residue whfch may 
be involved in the catalytic mechanism. 
Consensus pattern: (NS)-x-T-N-H-x-Y-fFW]-N-|LI] 

[ 1J Poolman B., Royer TJ., Mainzer S.E.. Schmidt B.F. J. Bacteriol. 172:4037-4047(1990). 

27. (AlkA DNA repair) 

Alkylbase DNA glycosidases alkA family signature 

[0216] Alkylbase DNA glycosidases (1 J are DNA repair enzymes that hydrolyzes the deoxyribose N-glycosidic bond 
to excise various alkylated bases from a damoged DNA polymer. In Escherrchia coli there are two alkylbase DNA 
glycosidases: one (gene tag)which is constitutively expressed and which is specific for the removal of 3-methy laden ine 
(EC 3.2.2.20), and one (gene alkA) whch is induced during adaptation to aikylation and which can remove a variety 
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of alkybtlon products (EC 3.^2.21). Tag and alkA do not share any region of sequence similarity. In yeast there is an 
alkytbase DNA glycosldase (gene MAGI) [2.3], which can remove S^ethybdenine or 7-methyladenlne and which is 
structurally related to alkA. MAG and alkA are both proteins of about 300 amino acid reskJues. While the C- and N- 
terminal ends appear to be unrelated, there is a central regkxi of about 1 30 reskiues which is well consented. A portion 
of this region has been selected as a signature pattern . 

Consensus pattern: G-l-G-x-W-[STl-[AVhx4LI\^FY](2)-x-fLIVMhx(8)-[MF]-x(2HEDH) 

[1] Lindahl T. Sedgwfck B. Annu. Rev. Biochem. 57:133-157(1988). 

[ 2] Berdal K.G., Bpras M., Bjelland S., Seeberg B.C. EMBO J. 9:4563-4568(1990). 

[ 3J Chen J.. Derfler B.. Samson L EMBO J. 9:4569-4575(1990). 

28. Ammonium transporters signature 

[0217] A number of proteins involved in the transport of ammonium ions across amembrane as well as some yet 
uncharacterized proteins have been shown [1,2] to be evolutfonary related. These proteins are: - Yeast ammonium 
transporters MEPl, MEP2 and MEP3. - Arabklopsis thaltana high affinity ammonium transporter (gene AMT1). - Co- 
rynebacterium glutamfcum ammonium and methylammonium transport system. - Escherichia cofi putative ammonium 
transporter amtB. - Bacillus subtilis nrgA. - Mycobacterium tubercubsis hypothetical protein MtCY338.09c. - Syne- 
chocystis strain PCC 6803 hypothetk:al proteins slIOIOS. s}l0537 and sll1017 - Methanococcus jannaschfi hypothetical 
proteins MJ0058 and MJ1343. - Caenorhabdrtis elegans hypothetk:al proteins C05E11.4, F49E11.3 and Ml 95.3. As 
expected by their transport function, these proteins are highly hydrophobe and seem to contain from 10 to 12 trans- 
membrane domains. The best consented regm seems to be located in the fifth (or sixth) transmembrane region and 
is used as a signature pattern. 

Consensus pattern: D-[FYWS]-A-G4GSCJ-x(2)-[IVhx(3)-[SAGJ(2)-x{2)-[SAGl-[U\^ 
R 

[ 1) Ninnemann O., Janniaux J.-C.. Frommer W.B. EMBO J. 13:3464-3471(1994). 

[ 2] Siewe R.M., Weil B., Burkovski A.. Eikmanns B.J., Eikmanns M., Kraemer R J. Biol. Chem 271 -5398-5403 
(1996). 

[ 3] Saier M.H. Jr. Adv. Mbrobiol. Physbl. 40:81-136(1998). 

29. (Arch_histone) 
CBF/NF-Y subunits signatures 

[0218] Diverse DNA binding proteins are known to bind the CCAAT box. a common cis-acting element found in the 
promoter and enhancer regions of a large number of genes in eukaryotes. Amongst these proteins is one known as 
the CCAAT-binding factor (GBF) or NF-Y [1 J. CBF is a heteromeric transcriptbn factor that consists of two different 
components both needed for DNA-binding. The HAP protein complex of yeast binds to the upstream activatkxi site of 
cytochrome C iso-1 gene (CYC1) as well as other genes involved in mitochondrial election transport and activates 
their expressbn. It also recognizes the sequence CCAAT and is structurally and evolutionary related to CBF The first 
subunit of CBF, known as CBF-A or NF-YB in vertebrates, HAP3 in budding yeast and as php3 in fission yeast, is a 
protein of 1 1 6 to 21 0 amino-acb residues which contains a highly consented central domain of about 90residues. This 
domain seems to be involved in DNA-binding; a signature pattern had been devebped from its central part The second 
subunit of CBF, known as CBF-B or NF-YA in vertebrates, HAP2 in budding yeast and php2 in fission yeast, is a protein 
of 265 to 350 amino-acb residues which contains a highly consen/ed region of about 60 residues. This region, called 
the 'essential core* [2]. seems to consist of two subdomains: an N-terminal subunit-association domain and a C-tenninal 
DNA recognitbn domain. A signature pattern has been devebped from a section of the subunit-association domain 
Consensus pattern: C-V-S-E-x-l-S-F-[LIVM)-T-[SG]-E-A-[SCl-[DE]-[KRQl-C- 
Consensus pattern: Y-V-N-A-K-Q-Y-x-R-l-L-K-R-R-x-A-R-A-K-L-E- 

1 1J Li X.-Y. Mantovani R., Hooft van Huijsduijnen R.. Andre I., Benoist C, Mathis D. Nucleb Acbs Res 20* 
1087-1091(1992). 

1 2] Olesen J.T.. Fikes J.D.. Guarente L Mol. Cell. Biol. 11:611-619(1991). 

30. Argininosuccinate synthase signatures 

[021 9] Argininosuccinate synthase (EC 6.3.4.5) (AS) is a urea cycle enzyme that catalyzes the penultimate step in 



10 
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arginine biosynthesis: the ATP-dependent ligation of citnjiline to aspartate to form argininosuccinaie. AMP andpyro- 
phosphate [1.2J.ln humans, a defect in the AS gene causes citmllrnemia, a genetic disease characterized by severe 
vomiting spells and mental retardation. AS is a homotetrameric enzyme of chains of about 400 amino-acid residues. 
Anarginine seems to be important for the enzyme's catalytic mechanism. The sequences of AS from varfous prokary- 
otes, archaebacteria and eukaryotes show significant similarity. Two signature patterns have been selected for AS. 
The first is a highly consented stretch of nine residues located in the N-terminal extremity of these enzymes, the second 
Is derived from a consented region which contains one of the conserved arginine residues. 
Consensus pattern: [AS]-{FY)-S-G-G-[LV]-D-T-[ST]- 
Consensus pattern: G-x-T-x-K-G-N-D-x(2)-R-F- 

1 1] van Vliet R. Crabeel M., Boyen A.» Tricot C, Stalon V. Falmagne R. Nakamura Y., Baumberg S Glansdorff 
N. Gene 95:99-104(1990). 

[ 21 Morris C J., Reeve J.N. J. Bacteriol. 170:3125-3130(1988). 
'5 31 . Armadillo/beta-catenin-like repeats 

[0220] Approx. 40 amino ackl repeat. Tandem repeats form super-helix of helices that is proposed to mediate Inter- 
action of beta-catenin with its ligands. CAUTION: This family does not contain all known arniadillo repeats. 

20 [1] Huber AH, Nelson WJ, Weis Wl, Cell 1997;90:871-682. 

[2] Gtimbiner BM. Curr Opin Cell Biol 1995;7:634-640. 

[3] Cavallo R, Rubenstein D. Peifer M, Curr Opin Genet Dev 1997;7:459-466. 

[4J Su LK, Vogelstein B, Kinzler KW, Science 1993;262:1734-1737. 

[5] Masiarz FR, Munemitsu S, Polakis P Science 1993;262:1731-1734 
^5 [6] Peifer M, Wieschaus E, Cell 1990;63:1167-1176. 

32. (Asn Synthase) 
Asparagine synthase 



30 



3$ 



[0221] This family is always found associated with GATase 2. Members of this family catalyse the conversion of 
aspartate to asparagine. 

33. Asparaginase.2 
Asparaginase 12 members 

34. (Aspartyl tRNA N) 

^0 AminoacyMransfer RNA synthetases class-ll signatures 

[0222] Aminoacyl-tRNA synthetases (EC6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer 
them to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least 
twenty different types of aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are gen- 
erally two amInoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form 
While all these enzymes have a common functon. they are widely diverse in terms of subunit size and of quaternary 
structure. The synthetases speclfk: for alanine, asparagine. aspartic acid, glycine, histidine. lysine, phenylalanine 
proline, senne. and threonine are referred to as class-ll synthetases (2 to 6] and probably have a common folding 
pattern in their catalytic domain for the binding of ATP and amino acid which is different to the Rossmann fold obsen/ed 
tor the class I synthetases [7]. Class-ll tRNA synthetases do not share a high degree of similarity, however at least 
three consen/ed regfons are present [2,5.8]. Signature patterns have been derived from two of these regions 
Consensus pattern: [FYH]-R-x-IDE]-x(4. 1 2)-[RH]-x(3)-F-x(3)-(DE] 

Consensus pattern: [GSTALVFl-{DENQHRKP}-[GSTAHLIVMFHDE)-R-[LIVMFl-x-(UVMSTAG]-[LIVMFYl 



45 



SO 
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[ 1] Schimmel P. Annu. Rev. Biochem. 56:125-158(1987). 
I 2) Delarue M., Moras D. BioEssays 15:675-687(1993). 
[ 3J Schimmel P. Trends Bkx:hem. Sci. 16:1-3(1991). 

1 4] Nagel G.M.. Doolillle R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 
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[ 5] Cus^k S.. Haertlein M., Leberman R. Nucleic Adds Res. 19:3489-3498(1991). 
[ 6) Cusack S. Bkxhtmie 75:1077-1081(1993). 

[ 7) Cusack S., Berthet-Cokxninas C, Haertlein M., Nassar N., Leberman R Nature 347:249-255(1990). 
[ 8] Leveque F., Plateau R. Dessen R, Blanquet S. Nucleic Ackis Res. 18:305-312(1990). 

35. (Art Gap) Putative GTP-ase activating protein for Arf. Putative zinc fingers with GTPase activating proteins (GAPs) 
towards the small GTPase. Art. The GAP of ARDl stimulates GTPase hydrolysis tor ARD1 but not ARFs. Number of 
members: 34 



10 [0223] 

[1]Medline: 96324970. ldentificatK>n and cloning of centaurin-aipha. A novel phosphatidylinositol 3»4,5-trisphos- 
phate-birwJing protein from rat brain. HammondsOdie LP. Jackson TR, Profit AA, Blader IJ. Turck CW, Prestwich 
GD. Theibert AB; J Biol Chem 1996;271:18859-18868. 
IS [2]Medline: 97296423, A target of phosphatkiylinositol 3.4.5-trisphosphate with a zinc finger motif similar to that 

of the ADP-ribosy latk)n -factor GTPase-activating protein and two pleckstrin homology domains. Tanaka K, Imajoh- 
Ohmi S, Sawada T. Shirai R, Hashofnoto Y. Iwasaki S, Kabuchi K, Kanaho Y, Shirai T. Terada Y, Kimura K, Nagata 
S, Fukui Y; Eur J Brochem 1997;245:512-519. 

[3] 981 1 2795, Molecular characterization of the GTPase-activating domain of ADP-rflx>sylatk)n factor domain pro- 
20 tein 1 (ARDl). Vitaie N. Moss J. VSaughan M; J Biol Chem 1998;273:2553-2560. 

36. Apoiipoprotein. Apolipoprotein A1/A4/E family. This family includes: Swiss:P02647 Apolipoprotein A-1. Swiss: 
P06727 Apolipoprotein A-IV. Swis5:P02649 Apolipoprotein E. These proteins contain several 22 reskiue repeats which 
form a pair of alpha helices. Number of members: 42 

25 

[0224] [1]Medline: 91289138. Three-dimensional structure of the LDL receptor-binding domain of human apolipo- 
protein E. Wilson C. Warden MR, Weisgraber KH, Mahley RW, Agard DA; Science 1991;252:1817-1822. 

37. Amino acid permeases signature^ 

30 

[0225] Amino ackJ permeases are integral membrane proteins involved in the transport of amino acids into the cell. 
A number of such proteins have been found to be evolutionary related [1,2,3). These proteins are: - Yeast general 
amino acki permeases (genes GAP1. AGP2 and AGP3). - Yeast bask: amino acid permease (gene ALP1). - Yeast 
LeuA/ial/tle permease (gene BAf^). - Yeast arginine permease (gene CAN1 ). - Yeast drcartx>xylic amino acid permease 

35 (gene DIPS). - Yeast asparagine/glutamine permease (gene AGP1 ). - Yeast glutamine permease (gene GNP1 ). - Yeast 
histidine permease (gene HIPI ). - Yeast lysine permease (gene LYP1 ). - Yeast proline permease (gene PUT4). - Yeast 
valine and tyrosine permease (gene VAL1/TAT1). - Yeast tryptophan permease (gene TAT2/SCM2). - Yeast choline 
transport protein (gene HNM1/CTR1). - Yeast GABA permease (gene UGA4). - Yeast hypothetical protein YKL174c. 
- Fission yeast protein isp5. - Fisskxi yeast hypothetical protein SpAC8A4.11 - Fisskm yeast hypothetical protein 

^0 SpAG1 1 D3.08C. - Emericella nidulans proline transport protein (gene pmB). - Trichoderma harzianum amnno acid per- 
mease INDA1. - Salmonella typhimurium L-asparagine permease (gene ansP). - Escherbhia coll aromatic amino acid 
transport protein (gene aroP). - Escherichia coli D-serine/D-alanine/glycine transporter (gene cycA). - Escherichia coli 
GABA permease (gene gabP). - Escherichia coli lysine-specifrc permease (gene lysP). - Escherichia coli phenylalanine- 
specific permease (gene pheP). - Salmonella typhimurium proiine-specific permease (gene proY). - Escherichia coli 
and Klebsiella pneumoniae hypothetical protein yeeF. - Escherichia coli and Salmonella typhimurium hypothetk:al pro- 
tein yifK. - Bacillus subtilis permeases rocC and rocE which probably transports arginine or omithine. These proteins 
seem to contain up to 12 transmembrane segments. As a signature for this family of proteins, the best conserved 
region which is located in the secorxJ transmembrane segment has been selected. 

Consensus pattern: [STAGC]-G-[PAG)-x(2,3)-[LIVMFYWA](2)-x-[LIVMFYW]-x-[LIVMFWSTAGG)(2)-[STAGC]-x(3)- 
50 (LIVMFYWTl-x-[UVMST]-x(3)- ILIVMCTAHGAl-E-x(5)-iPSAL]- 

1 1) Weber E.. Chevalier M.R.. Jund R. J. Mol. Evol. 27:341-350(1988). 
[ 2] Vandenbol M., Jauniaux J.-C, Grenson M. Gene 83:153-159(1989). 

[ 3] Reizer J.. Finley K.. Kakuda D.. McLeod C.L, Reizer A., Saier M.H. Jr. Protein Sci. 2:20-30(1993). 
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38. aakinase (1 ) Glutamate 5-ktnase signature 



[0226] Glutamate 5-kinase (EC 2.7.2.n ) (gamma-glutamyl kinase) (GK) is the enzyme that catalyzes the first step 



£P 1 033 405 A2 



in the biosynthesis of proline from glutamate. the ATP-dependent phosphorylation of L-glutamate Into L-glutamate 
5-phosphate. In eubacteria (gene proB) and yeast [1) (gene PR01), GK is a monofunctional protein, while in plants 
and mammals, it is a bifunctional enzyme (P5CS) [2]that consists of two doniatns: a N-terminai GK domain and a C- 
terminal gamma-glutamyl phosphate reductase domain (EC 1.2.1.41 ) (see <PDOC00940>).As a signature pattern, a 
highly consented glycine-and alanine-rlch region located in the central section of these enzymes has been selected. 
Yeast hypothetical protein YHR033w is highly similar to GK. 

Consensus pattern: [GSTN]-x(2)-G-x-G4GCHIMJ-x-[STA)-K-[LIVM]-x-ISA]-[TCA]-x{2)-[GALV]-x(3)-G- 
[ 1) Li W., Brandriss M.C. J. Bacteriol. 174:4148-4156(1992). 

( 2] Hu C.-A.A.. Delauney A.J., Verma DP.S. Proc. Natl. Acad. Sci. U.S.A. 89:9354-9358(1992). 
aakinase (2) Aspartokinase signature 

[0227] Aspartokinase (EC 2.7.2.4 ) (AK) [1] catalyzes the phosphorylation of aspartate. The product of this reaction 
can then be used In the biosynthesis of lysine or in the pathway leading to homoserine, which participates in the 
biosynthesis of threonine, isoteucine and methionine. In Escherichia coli, there are three different isozymes whfch differ 
in their sensitivity to represskx) and inhibition by Lys, Met and Thr. AK1 (gene thrA) and AK2 (gene metL) are bifunctional 
enzymes which both consist of an N- terminal AK domain and a C-terminal homoserine dehydrogenase domain. AK1 
is involved in threonine biosynthesis and AK2, in that of methionine. The third isozyme. AK3 (gene lysC). is monofunc- 
tional and Involved in lysine synthesis. In yeast, there is a single isozyme of AK (gene HOM3). As a signature pattern 
for AK, a consented region located in the N-terminal extremity has been selected. 
Consensus pattern: ILIVM]-x-K-[FYl-G-G-(STHSCHUVM]- 
[ 1] Rafalski J.A., FalcoS.C. J. Bk>l. Chem. 263:2146-2151(1988). 

aakinase (3) Gamma-glutamyl phosphate reductase signature 

[0228J Gamma-glutamyl phosphate reductase (EC 1.2.1.41) (GPR) is the enzyme that catalyzes the second step in 
the bk)synthesis ol proline from glutamate, the NADP-dependent reduction of L-glutamate 5-phosphate into L-gluta- 
mate 5-semialdehyde and phosphate. In eubacteria (gene proA) and yeast (1 ] (gene PR02), GPR is a monofunctional 
protein, while in plants and mammals, it is a bifunctional enzyme (P5CS) [2}that consists of two domains: a N-terminal 
glutamate 5-kinase domain(EC 2.7.2.11 ) (see <PDOC00701 >) and a C-temninal GPR domain. As a signature pattern, 
a conserved region that contains two histidine residues has been selected. This regton is kxated in the last third of GPR 
Consensus pattern: V-x(5)-A-[LIV]-x-H-l-x(2)-[HY]-[GS)-[ST]-x-H-[STl-lDE]-x- 1- 

1 1] Pearson B.M., Hernando Y, Payne J.. Wolf S.S., Kalogeropoulos A.. Schweizer M. Yeast 12:1021-1031(1996). 
[ 2] Hu C.-A.A., Delauney A. J., Verma D.RS. Proc. Nail. Acad. Sci. U.S.A. 89:9354-9358(1992). 

39. (abhydrolase) alpha/beta hydrolase foW. This catalytic domain is found in a very wkie range of enzymes. 

[0229] [1 ] OIlis DU Cheah E. Cygler M. Dijkstra B. Frotow F. Franken SM. Harel Remington SJ, Silman I. Schrag 
J. Sussman JL. Verschueren KHG, GoWman A, Protein Eng 1992;5:197-211. 

40. (Acid phosphat) Histidine acid phosphatases signatures 

[0230] Acid phosphatases (EC 3.1.3.2) are a heterogeneous group of proteins that hydrolyze phosphate esters, 
c^timally at low pH. It has been shown (1 J that a number of acid phosphatases, from both prokaryoles and eukaryotes! 
share two regtons of sequence similarity, each centered around a conserved histidine reskJue. These two histidlnes 
seem to be involved in the enzymes* catalytic mechanism [2,3). The first histidine is kx:ated in the N-terminal section 
and forms a phosphohistWine intermediate while the second is kx:ated in the C- terminal section and possibly acts as 
proton donor. Enzymes betonging to this family are called 'histkline acid phosphatases* and are listed below: 

- Escherichia coli pH 2.5 acid phosphatase (gene appA). 

- Escherichia coli glucose- 1 -phosphatase (EC 3.1.310) (gene agp). 

Yeast constitutive and repressible acid phosphatases (genes PH03 and PH05). 
Fission yeast acid phosphatase (gene phol ). 

- Aspergillus phytases A and B (EC 3. 1 .3.8) (gene phyA and phyB). 
Mammalian lysosomal acid phosphatase. 

Mammalian prostatic acid phosphatase. 
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. Caenoitiabditis elegans hypothetical proteins B0361.7. C05C10.1. C05C10.4 and F26C11.1. 

ra^31] Consensus pattamlUVn^x(2HUVMAhx{2HUVMhxfl-H-|GNhx-R-x4PASl {H is ms phosphohistidine resi- 

Consensus Pa«emILIVfc«T-x4U\^FA6hx(2HSTAGIH<-D4STAr«hx4UVMfx(2WUV^ ,u ■ 

Uve Site residue] Sequences Known to belong to this^lasTdetec. J b^ZSI^reSlo^^^^ 
phosphatase which seems to have Tyr instead of the active site His e«ept tor rat prostatic acid 

1 1! ^^^Z^-^' D^^idson R.. Stevis P.E.. MacArthur H.. Moore D.L J. Biol Chem 266-29ia.29io/iooi» 
1 2,^^tan. K.. Ha„.s E.H.. Stevis P.E.. Kuciel a. Zhou M.-^.. van Etten RL J eSl^S'2'6S^^836 

[ 3J Schnerder G.. Lindqvist Y., Vihko P. EMBO J. 12:2609-2615(1993). 

41. Aconitase family signatures 

course o, the reactioa InTuC^i^i^o, ^LTL^iS^.o" ""t^^^^^ 
matrix and the other found in the cvtoDtesm^^^^ ^ . f '"^'^ " *® mitochondrial 

cysteine rescues haveTln s^^H b^Jga^S^ ^F^^^^^ 4Pe^ i^.u«ur clusten three 

ateo contains the .oltowing proteins: - .ron-reTpc^sre J^i^^ ^j^r,^^^^ TT '"""^ 

that b^ds to l^on-raspons^,e elements (IREs). IREs are ^^^7r^^s ^l^^!;^l^,TT: S h 7 
am.nolevulin.c acid synthase mRNAs. and in the 3'UTR of tran^in recepTw mRNTlRlip^^ ' ''^^ 

activity. - 3-lsopropylmalate dehvdratasa fEC 4 p i "RE-BP also express aconitase 

second step in m^^UynthesT^tle iS^ i^^^^^ "^^'^ 
participates in the alpha^noadipate path^TS^S wS^IS ""^f^^' ^ 

moisocitric acid. - Esherichia coli protein ybhT '"*«y"«^««« and that converts cis4wmoaconitate into hc^ 

'o~c IC t>.ds the 
fuTZ::^ ^-(2HLIVVW>Q^x(3HGACJ.C^GSTA^^^L.^«>TAK^^^ two C-s bind 0,e ««-sul- 

[ 1] Gruer M.J.. Artymiult P.J.. Guest J.R Trends Biochem. Sci. 22:3^(1997). 
42. Actins signatures 

are a major c<J,S^, c^SZS a^StufT;«^^^^ ^ 

probably involved in a variety of functions ^i.rh a« r^r^i^r . ' ' "^"^ isoforms which are 

perceptL. ceil wall depos J^e^ S eS eim^ S^a m*2^irZ rsi^^T «P 9-«th. gravi- 

Each actin monomer canbind a molecu e JaTP lln .^^Z^o. ^ "^■"^ " ^ PO'y^*^'^** <o™ F-actin). 
Of from 374 to 379 amino acW rSes Theirru^^e SSr^T.ri"'!;^ ''^ ^ 
Recently some divergent actin-like proteiS h^iTteln JifT^ ^ °' ^^o'^^o"- 

(actin-RPV)frommam,^ternqi ^^^^^ ^ 'P^'^'^^ "^''^^^ are: - Centractin 

seemstobeacompS,™ jaS su^^^^^^ 

subfamiV is al^ZTas AR?r ARP2 luW^^T^ T^^^ '^"'"'^ microtubule based vesicle motifity. This 
elegansLc. " ARPsTub^iZ:.*™"^^ -D. C. 

act2. - ARP4 subfamily which includes veast Arr^ n^^ ""^"^^^^fosopma 66B. yeast ACT4 and frssion yeast 

The first two are specL ^^S^^ £ to STetr '^^^ ^" 

and the actin-like proteins and correspj^s ^positions Soe^oTeTal?' "'"^ ""^ 

Consensus pattern: IFY]-{UV)-G-[DEl-E-A-Q-x-(RKQ](2)-6- 
Consensus pattern: W-|IV]-ISTA^[RKI-x-IDEJ-Y-IDNEHDE^ 

Consensus pattern: ILIWHLIVM]-T-E-IGAPQJ.x-|LIVMFYWHQhN-[PSTAQ)-x(2)-N-(KRJ- 

\ 2\ ?Jl^^7n 'rJ^°":'^ ^ P^ess Ltd. London (1996) 

„ ^ Biochem. 55:987-1036(1986) 

I 3J Pollard T.D. Curr. C^in. Cell Biol. 1 :33-40(1 990). 
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[ 4] Rubenstein PA BioEssays 12:309-315(1990). 

( 5) Meagher R.B., McLean B.G. Cell MotiL Cytoskeleton 16:164-166(1990). 

43. Adenylate kinase signature 

[0234] Adenylate kinase (EC 2.7.4.3) (AK) [1] is a small monomeric enzyme that catalyzes the reversible transfer of 
MgATP to AMP (MgATP + AMP = MgADP + ADP). In mammals there are three different isozymes: - AK1 (or myoklnase), 
which is cytosolic. - AK2. which is located in the outer compartment of mitochondria. - AK3 (or GTP.AMP phospho- 
transferase), which is located in the mitochondrial matrix and which uses MgGTP instead of MgATP.The sequence of 
AK has also been obtained from different bacterial species and from plants and fungi. Two other enzymes have been 
found to be evolutfonary related to AK. These are: - Yeast uridylate kinase (EC 2.7.4.-) (UK) (gene URA6) [2] whfch 
catalyzes the transfer of a phosphate group from ATP to UMP to form UDP and ADP. - Slime moW UMP-CMP kinase 
(EC 2.7.4. 14 ) [3] which catalyzes the transfer of a phosphate group from ATP to either CMP or UMP to form CDP or 
UDP and ADP. Several regions of AK family enzymes are well conserved^ including the ATP-binding domains. The 
most consented of all regbns have been selected as a signature for this type of enzyme. This region includes an 
aspartic acid reskJue that is part of the catalytic cleft of the enzyme and that is involved in a salt bridge. It also includes 
an arginine residue whose modification leads to inactivatk>n of the enzyme 
Consensus pattern: [LIVMFYWJ(3)-D-G-[FYI]-P-R-x(3)-[NO]- 

[ 1) Schuiz G.E. Cold Spring Harbor Symp. Quant. Biol. 52:429-439(1987). 

[ 2] Liljelund R. Sanni A, Frlesen J.D., Lacroute R Bkx:hem. Biophys. Res. Commua 165:464-473(1989). 
[ 3) Wiesmueller L, Noegel A.A., Barzu O., Gerisch G., Schleicher M. J. Biol. Chem. 265:6339-6345(1990). 
[ 4] Kath TH.. Schmid R.. Schaefer G. Arch. Biochem. Bfophys. 307:405-410(1993). 

[0235] 44. (adh_short) Short-chain dehydrogenases/reductases family signature. 

[0236] The short-chain dehydrogenases/reductases family (SDR) [1 ] is a very large family of enzymes, most of which 
are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterized 
was Drosophila alcohol dehydrogenase, this family used to be called J2,3,4rinsect-type\ or 'short-chain* alcohol dehy- 
drogenases. Most member of this family are proteins of about 250 to 300 amino ackJ resWues. The proteins currently 
known to belong to this family are listed betow. - Afcohol dehydrogenase (EC 1.1.1.1 ) from insects such as Drosophila. 

- Acetoin dehydrogenase (EC 1.1.1.5 ) from Klebsiella terrigena (gene budC). - D4)eta-hydroxybutyrate dehydrogenase 
(BDH) (EC 1.1.1.30) from mammals. - Acetoacetyl-CoA reductase (EC 1.1.1.36 ) from various bacterial species (gene 
phbB or phaB). - Glucose 1 -dehydrogenase (EC 1.1.1.47 ) from Bacillus. - 3-beta-hydroxysterokJ dehydrogenase (EC 
JLUJI) from Comomonas testosteroni. - 20-beta-hydroxysteroid dehydrogenase (EC 1.1.1.53 ) from Streptomyces 
hydrogenans. - Ribitol dehydrogenase (EC 1.1.1.56 ) (RDH) from Klebsiella aerogenes. - Estradiol 17-beta-dehydro- 
genase (EC 1.1.1.62 ) from human. - Gluconate 5-dehydrogenase (EC 1.1.1.69) from Gluconobacter oxydans (gene 
gno). - 3-oxoacyKacyl-carrier protein] reductase (EC 1.1.1.100 ) from Escherichia coll (gene fabG) and from plants. - 
Retinot dehydrogenase (EC 1.1.1.105 ) from mammals. - 2-deoxy-d-gluconate 3-dehydrogenase (EC 1.1.1.125 ) from 
Escherichia coli and Enwinia chrysanthemi (gene kduD). - Sorbitol-6-phosphate 2-dehydrogenase (EC 1.1.1.140 ) from 
Escherichia coli (gene gutD) and from Klebsiella pneumoniae (gene sorO). - 15-hydroxyprostaglandin dehydrogenase 
(NAD+) (EC1.l.l.l41)from human. - Corticosteroid 11-beta-dehydrogenase (EC 1.1.1.146 ) (11 -DH) from mammals. 

- 7-alpha-hydroxysteroW dehydrogenase (EC 1.1.1.159 ) from Escherichia coli (gene hdhA), Eubacterium strain VPI 
1 2708 (gene baiA) and from Clostridium sordellii. - N ADPH-dependent carbonyl reductase (EC 1.1.1.184 ) from mam- 
mals. - Tropinone reductase-l (EC 1.1.1.206 ) and -II (EC 1.1.1.236 ) from plants. - N-acylmannosamine 1 -dehydroge- 
nase (EC 1.1.1.233) from Flavobacterium strain 141-8. - Oarabinitol 2-dehydrogenase (ributose forming) (EC 
VMJ50) from fungi. - Tetrahydroxynaphthalene reductase (EC 1.1.1.252 ) from Magnaporthe grisea. - RerkJine re- 
ductase 1 (EC 1.1.1.253 ) (gene PTR1) from Leishmania. - 2,5-dichtoro-2,5-cyclohexadiene-1,4-d»ol dehydrogenase 
(EC 1.1.-.-) from Pseudomonas paucimobilis. - Cis-1,2-dihydroxy-3,4-cyctohexadiene-1-carboxylate dehydrogenase 
(EC 1.3.1. -) from Acinetobacter calcoaceticus (gene benD) and Pseudomonas putida (gene xylL). - Blphenyl-2,3-di- 
hydro-2,3-diol dehydrogenase (EC 1.3.1 .-) (gene bphB) from various Pseudomonaceae. - CIs-toluene dihydrodiol de- 
hydrogenase (EC 1.3.1.-) from Pseudomonas putida (gene todD). - CIs-benzene glycol dehydrogenase (EC 1.3.1.19 ) 
from Pseudomonas putida (gene bnzE). - 2,3-dihydro-2,3-dihydroxybenzoate dehydrogenase (EC 1.3.1.28 ) from Es- 
cherichia coli (gene entA) and Bacillus subtilis (gene dhbA). - Dihydropteridine reductase (EC 1.6.99.7 ) (HDHPR) from 
mammals. - Lignin degradation enzyme ligD from Pseudomonas paucimobilis. - Agropine synthesis reductase from 
Agrobactertum plasmids (gene masi). - Versicolorin reductase from Aspergillus parasiticus (gene VER1). - Putative 
keto-acyl reductases from Streptomyces polyketide biosynthesis operons. - A trif unctbnal hydratase-dehydrogenase- 
epimerase from the peroxisomal beta-oxidation system of Candida tropicalis. This protein contains two tandemly re- 
peated 'short-chain dehydrogenase-type* domain in its N-tenninal extremity. - Nodulatbn protein nodG from species 
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d Azosp^nnum and Rhizobium whch is probably involved In the modification of the nodulation Nod factor laity acyl 
Chain. - Nrtrogen fixator, protein fixR from Bradyrhizoblum japonicum. - Bacillus subtllis protein dItE which is ^tZ 
.0 the b.osynthes« of D- alanyl-lipoteichoic acid. - Human follicular variant translocation protein 1 (FVTl) -T^S 
K^T^f^- " 6. - Maize sex determination protein TASSELSEED 2. - Sarcoptaga pere- 

grma 25 Kddevelopment specific proteia - Drosophila fat body protein P6. - A Listeria monocytogeneshypoThetical 
protein encoded m the mlemalins gene region. - Escherichia coli hypothetical protein yciK. - EsJie^hia coUhypoih^- 
Kal prote»i ydfG. - Escherichia coli hypothetical protein yjgl. - Escherichia coli hypothetical protein yjgU. - EsJiwichia 
con hypothe^cal protejn yohF. - Bacillus subtilis hypothetical protein yoxD. - Bacillus subtilis hypotheLl protein^? 
- Bacillus subtH.s hypothet^al protein ywfH. - Yeast hypothetical protein YILt24w. - Yeast hypoSirttaal protein Yl^' 

SpAC23Dai1 . One of the best conserved regions which includes two perfectly conseived residues, a tyrosine and a 

Sjt^ m^TiST'^ ^ ^ '^"^ °' P"^^"^- "^"^^ participates in the 

Consensus pattern: [UVSPADNKhx(12).Y-[PSTAGNCVJ-(STAGNQCIVMHSTAGCl-K- (PCMSAGFYRMLIVM 
STAGD^x(2HUVMFYW^x(3)- ILIVMFYWGAPTHQHGSAcbRHM] {Y is an i^ive site riiduep l^'^'^^""'^^- 

^^'^i'mS).''*'^ ^" '^''^ ®- ^"^-^ J^W^-y J- Ghosh D. Biochemistry 34: 

[ 2) Villarroya A.. Juan E.. Egestad B.. Joemvall H. Eur. J. Biochem. 180:191-197(1989) 
( 3J Persson B., Krook M., Joernvall H. Eur. J. Biochem. 200-537-543(1991) 

[4] Neidle E.L. Hartnett C. Omston N.L. Bairoch A.. Rekik M.. Harayama S. Eur. J. Biochem. 204:113-120(1992). 
[0237] 45. (adh_short_C2) Short-chain dehydrogenases/reductases family signature 

The short-chain dehydrogenases/reductases family (SDR) (1J Is a very large family of enzymes, most of which are 
SZLSi, i^f; r kT'*'*'''""*"' oxidoreductases. As the firs, member of this famij^to be cfTrLl^^^ 
Drosophila alcohol dehydrogenase, this family used to be called [2.3.4rinsect-type'. or 'short-chain- alcohol 
genases. Most member of this family are proteins of about 250 to 300 amino acid residues. Ih^^^ ^^ 
^ betong to ,h« farnily are listed below. - Alcohol dehydrogenase (ECjJiD from insects sucJas DrS^Sa 

IhhB n^Ss^ ^T^- ' '^^^ (EC 1ii36) from various bacterial s^iS (gene 

phbBwphaB). - Glucose 1 -dehydrogenase (EC1. 1.1. 47) from Bacillus. - 3-beta-hydroxysteroid dehydr^enase (EC 
from C^monas testosteroni. - 2045eta-hydroxysleroid dehydrogenase (EC 1.1.1.531 from areptomvces 

genase (EC VLL62) from human. - Gluconate 5<fehydrogenase (EC 1.1.1.691 from Gluconobacter oxydans (qene 
S - ^^•«':°^'^y'-lacyl^rner proteinj reductese (EC 1.1.1.100) from E^S^i^^ia coli (gene fabG) and?«^ plir 
Reunol dehydrogenase (ECiUJOS) from mammals. - 2-deoxy.J-gIuconate 3KJehydr?>9enase (EC 1 T^25H rom 
Es*ench« co^i and En«nia chrysanthemi (gene kduD). - So,bi,ol-6^,hosphate 2-dehydr4enase Ec TTTlS fZ 

(nTd^TTe^ T T41 r ^ r r"'"' • ^5-hyC^.xyprLtaglandin 

y ^, l 1 1 ^ ' C°rt«osteroW 11-betaKJehydrogenase (EC 1.1.1.146 1 (11 -DH) from mammals 

- J^a^a-hydroxysteroKl dehydrogenase (EC 1.1.1.159) from Escherichia coli (g^liT^iiA) Eubac teZ^S^Jpi 
1 2708 (gene baiA) and from Ctostridium sordelBi. - NADPH-dependent ca,t>onyl reductase (EC IIJieiHrom Irn 
mals. - Tropmone reductase-l (EC 1 1 1 2061 and -li /Pri 1 1 9<jr\ r.t^J^ k, . ■ ' ' 'o-* ; irom mam- 
na<» iPri 1 1 oifx, "''^ '^^dilJ.:^ ana II (tcvijj^) Irom plants. -N-acylmannosaminel-dehydroge- 

Tl^isSfSS T? "^^^T ■ 2-dehydrogenase (ributose forming) (ic 

K 1 iT253rS:yf ^^^^^^^ i^CUJ^) from Magnaporthe grisea. - RericSie re- 

fEC 1 f l^i^yfSJ^ ^ T - 2.5-dichtoro-2.5<yctohexadiene-1.4-d»l dehydrogenase 

E§ 3 1 Z, A ?l!°"f' "^"f"™**'**- - Cis-1.2-dihyd.oxy-3.4Hqrclohexadiene-1.carboxylate dehjd^enase 
f H il J '^'"/^■"^tobacter cateoaceticus (gene benD) and Pseudomonas putida (gene xylL) - Biphen72 3KJi- 

Trlt^-^^'^fJ^lT^'^ '^"^ '9*™ "P^^J Pseudomonaceae.' CisSuene dW^LbfS- 

hy*(^enase (EC 1.3.1.-) from Pseudomonas putkJa (gene todD). - Cis-benzene glycol dehydrogenase (EcTs 1 ?9) 
from f^seudornonas putida (gene bnzE). - 2.3-dihydro-2.3-dihydroxybenzoa,e dehydrogenase ^C 1 3 1 28)tliiS 
cherichia col. (gene entA) and Bacillus subtilis (gene dhbA). - Dihydropteridine reductase (EC AMtTTSSlpSlro^, 
mammals. - Lignin degradation enzyme ligD from Pseudomonas paucimobilis. - Agropiie s^SS rS^SS Z 
Agrobac enum plasmids (gene masi). - Versicolorin reductese from AspergiHus paT^tticus fgene VEmr^utl^ 
ke o-acyl reductases from Streptomyces polyketide biosynthesis operons. - A trif unctional hydratase<Jehydroge,2r 

n^T^^^^K Tr''^^' "^"^-^^^^ ^y^'^'" Candida tropicalis. This proteb, contains two Sm^- 
peated Short-chain dehydrogenase-type" domain in its N-tem,inal extremity. - Nodulation protein nodG 

I^T; T ^''^^ P'^'^'y '"^^""^^ modificatfon of the noduLtion NcSfact^^ar^l 

Cham. - Nitrogen fixation protein fixR from Bradyrhizoblum japon«um. - Bacillus subtilis protein dItE whIS, « 
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In the biosynthesis ol D- alanyl-lipotelchoic acid. - Human follicular variant translocation protein 1 (FVTl). - Mouse 
adipocyte protein p27. - Mouse protein Ke 6. - Maize sex determination protein TASSELSEED 2. - Sarcophaga pere- 
grina 25 Kd development specific protein. - Drosophila fat body proteb) P6. - A Listeria monocytogenes hypothetical 
protein encoded in the intemalins gene region. - Escherichia coli hypothetical protein yclK. - Escherichia coli hypothet- 
s \ca\ protein ydfG. - Escherichia coli hypothetical protein yjgl. - Escherichia coli hypothetical protein yjgU. - Escherichia 
coli hypothetical protein yohR - Bacillus subtilis hypothetical protein yoxD. - Bacillus subtilis hypothetical protein ywfD. 

- Bacillus subtilis hypothetical protein ywfH. - Yeast hypothetical protein YIL124w. - Yeast hypothetical protein YIR035c. 

- Yeast hypothetical protein YIR036c. - Yeast hypothetical protein YKU)55c. - Fission yeast hypothetical protein 
SpAC23D3.11. One of the best conserved regions which includes two perfectly consented residues, a tyrosine and a 

10 lysine has been used as a signature pattern for this family of proteins. The tyrosine residue participates in the catalytic 
mechanism. 

Consensus pattern: [LIVSPADNK]-x(12)-Y-{PSTAGNCVl-[STAGNQCIVMHSTAGC]- K- {PCHSAGFYRHUVM- 
STAGD]-x(2)4LIVMFYWl.x(3)- [LIVMFYWGAPTHQJ. [GSACQRHM] [Y Is an active site residue] 

IS [1] Joemvall H.. Persson B., Krook M., Atrian S., Gonzalez-Duarte R. Jeffery J.. Ghosh D. Biochemistry 34: 

6003-6013(1995). 

[ 2] Villarroya A, Juan E., Egestad B.. Joernvall K Eur. J. Biochem. 180:191-197(1989). p'^ 

[ 3) Persson B., Krook M.. Joemvall H. Eur. J. Bk«hem. 200:537-543(1991). y< 

1 4] Neidle E.L, Hartnett C.» Omston N.L, Bairoch A. Rekik M.» Harayama S. Eur. J. Bkx:hem. 204:1 1 3-120(1 992). 
20 **" 

[0238] 46. (adh_zinc) Zinc-containing alcohol dehydrogenases signatures Alcohol dehydrogenase (EC 1.1.1.1 ) 
(ADH) catalyzes the reversible oxidation of ethanol to acetakiehyde with the concomitant reduction of NAD [1 ]. Currently 
three, stnjcturally and catalytically, different types of alcohol dehydrogenases are known: - Zinc-containing Mong-chain' 
alcohol dehydrogenases. - Insect-type, or 'short-chain' ak:ohoi dehydrogenases. - Iron-containing alcohol dehydroge- 

25 nases.Zinc-containing ADH's [2,3] are dimerk: or tetrameric enzymes that bind two atoms of zinc per subunit. One of 
the zinc atom is essential for catalytk: activity while the other is not. Both zinc atoms are coordinated by either cysteine 
or histkltne residues: the catalytic zinc is coordinated by two cysteines and one histkjine. Zinc-containing ADH's are 
found in bacteria, mammals, plants, and in fungi. In most species there are nrwre than one isozyme (for example, 
human have at least six isozymes, yeast have three, etc.). A number of other zinc-dependent dehydrogenases are 

30 closely related to zinc ADH [4). these are: - Xylitol dehydrogenase (EC 1.1.1.9) (D-xylutose reductase). - Sorbitol de- 
hydrogenase (EC 1.1.1.14 ). - Aiyl-ateohol dehydrogenase (EC 1.1.1.90) (benzyl afeohol dehydrogenase). - Threonine 
3-dehydrogenase (EC 1.1.1.103) . - CInnamyl-alcohol dehydrogenase (EC 1.1.1.195 ) (CAD) [5). CAD is a plant enzyme 
involved in the bbsynthesis of lignrn. - Galactrtol-1 -phosphate dehydrogenase (EC 1.1.1.251 ). - Pseudomonas putkJa 
S-exo-alcohol dehydrogenase (EC 1.1.1.-) [6]. - Escherichia coli starvatkxi sensing protein rspB. - Escherk:hla coli 

3S hypothetical protein yjgB. - Escherichia coli hypothetical protein yjgV - Escherichia coli hypothetical protein yjjN. - Yeast 
hypothetical protein YAL060w (FUN49). - Yeast hypothetical protein YAL061W (FUN50). - Yeast hypothelkal protein 
YCRlOSw. The pattern that has been developed to detect this class of enzymes is based on a consented region that 
includes a histidine residue which is the second ligand of the catalytk: zinc atom. This family also includes NADP- 
dependent quinone oxkioreductase (EC 1.6.5.5) .an enzyme found In bacteria (gene qor), in yeast and in mammals 

40 where, in some species such as rodents, it has been recruited as an eye lens protein and is known as zeta-crystallin 
[7]. The sequence of quinone oxidoreductase is distantly related to that other zinc-containing alcohol dehvdrogenases 
and it lacks the zinc-llgand residues. The torpedo fish and mammlian synaptic vesicle membrane protein vat-1 is related 
to qor. A specific pattern has been developed for this subfamily. 
Consensus pattern: G-H-E-x(2)-G-x(5)-{GA]-x(2)-[l VSAC] [H is a zinc ligand] 

45 Consensus pattern: IGSD]-IDEQHJ-x(2)-L-x(3)-[SA](2)-G-G-x-G-x(4)-Q-x(2)-[KRl- 

( 1] Branden C.-l., Joernvall H., Ekiund H.. Fumgren B. (In) The Enzymes (3rd edition) 11:104-190(1975). 
[ 2] Joernvall H., Persson B., Jeffery J. Eur. J. Bkx;hem. 167:195-201(1987). 
[ 3] Sun H.-W., Plapp B.V. J. Mol. Evol. 34:522-535(1992). 
50 [ 4] Persson B.. Hallbom J., Walfridsson M., Hahn-Haegerdal B,. Keraenen S.. Penttilae M., Joernvall H. FEBS 

Lett. 324:9-14(1993). 

( 5] Knight M.E., Halpin C, Schuch W. Plant Mol. Biol. 19:793-801(1992). 

( 6) Koga H., Aramaki H., Yamaguchi E.. Takeuchi K.. Horiuchi T, Gunsalus I.C. J. Bacterk)!. 166:1089-1095(1986). 
( 7] Joemvall H., Persson B.. Du Bois G., Lavers G.C.. Chen J.H., Gonzalez P., Rao P.V., Zigler J.S. Jr. FEBS Lett. 
ss 322:240-244(1993). 

[0239] 47. (aldedh) Aldehyde dehydrogenases active sites 

[0240] Aldehyde dehydrogenases (EC 1.2.1.3 and EC 1.2.1.5) are enzymes which oxidize a wide variety of aliphatic 
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enzyme, and class IV a microsomal enzyme ^SS^!? T^"' ""^^ cytosic 

bacterial species. A number ol eraym^ ZlTStoS^ ^ '""S^ 

5 enzymes are listed below. - Plantsld t^^tS^^S^^j:^'"^ '° "^'"^^ ^^ehyd-ogenasesfthS 
catalyzes me tet step In the biosynthesis of betehe Zfai^S^S'Z^ '^''^^'^ I^J' ^ «hat 
phate dehydrogenase (EC 1 .2. i .9). - Escherichia cofi su^^lf »^ NADP-dependent 9lyce.aldehyde-3i>hos- 
teene gabD) (3,. which '^^^^^^Ts^^^^^^Zl^.^^ ^'^^^''*> ^^^^ 

(ECr2i22) (gene aid) (4J. - Mammalian succinatnS^S^^:: «>« bctaWehyde dehydr^^i^ii; 

o P^-enylacetaWehyde dehydrogenase (EC It.SJ^^^ 

.aldehyde dehydrogenase (gene hpcC). - PseliS^s p^T^^^LT '^^'^^''^■''^'^'^'^ ^ 
(9enesdmpCandxylQ).anenzymeintiemeta^l^eim^v,^^^^ semialdehyde dehydrogenase fS] 

- Bacterial and mammalian methylmalonate-sem^lTS 

invoVed in the distal pathway o' valine cmaSr- t2^^^^ (EC 12^27, (6]. an enzyme 

' JJ^^[^<9enePUT2).whichconvertsprolinetol^uta,^.C;^^^^^ clehydrogenaseTc 
a delta-1 -pyrroline- S^rboxylato dehydrogenase dOmSTrLsTl! h "'"""""^''^ P"tA protein, which contains 
induced by dehydration of shoots (81 ■^^J^ne^;!^;^:^^^''''''' '""^ ^ 
cytosolic enzyme responsibleforthe NADPKlepen^mS (ECi5i6) [gj. This is a 

rahydrofdate. It is an protein of about 900 am^Sl SZtlTJ^ ""^^ lO-fomvUetrahydrofolate into tet- 

• rescues) is stnicturally and functionally related to aSll IZ^^ ^ ^ *>^i" (480 

- Yeas, hypothetical pro.eh YER073w - Ye^X^Z^^^^'^:: " Protein YBRoiew. 
protern F01 F1.6.A glutamic acid and a cysteine Su^^^f!! ?^- " ^^^"•"^^''ditis elegans hypothetical 

f JJ w T' ^^"P^' ^ - Biochemistry 28:1160-1167(1989) 

[ 2] Weretilnyk E.A.. Hanson A.D. Pioc NatLAeari o^i nc « «, 

[ 3] Niegemann E.. Schuiz A. BartsS^K^nf^ . 87:2745-2749(1990). 

[ 4J Hidalgo E., Chen Y-M UnT^ a^^?' ^ T^' '^ "^-^OSa^). 

^41] 48. AldtVketo reductase family signatures 

ent«^:srrrw^r~^^^^^^^ 

•Juctase (EC 1112). . Aldose reductase (EcT 121 13!.^^!^ '° ^'""^ '^'""y ^re: - AldehySe re- 
errrjinates androgen acUon by converting L^^d^^S^Z^lTn*^^^^ (ECi1i50)'which 
synthase (EC 1J .1.188) which catalyzes the r^ucCSTrSS. JZi wt^^^^^^ ' P'°«'«9landin F 

phate dehydrogenase (EC 1.l.i.20o' from apT ^omh^ fiS!^ ^ ^"'^ ''^■^'P^- ' D-«»rbitol*phos. 
pufda plasmid pMDH7.2 (gl^iT^i,. . ChrLr^S^tas^fpfTrf^^^^^^ PseudoL,as 
rdecone (kepone) to the corresponding alcohol ^itet^D nhJ^>^^> **** P««««=«e chio- 

the reduction Of 2.5^iketogluconic acid to Jketo-L^^^^^^^^^^ ^ <EC 1.1.1.-) which catalyzes 

- NAD(P)H-dependent xylose reduc J°ecTm ) r^t^e v^J^*'!!''^'*''"'" P^'^"'^'^ 
xy^rt^- Trans-1 .2-<ihydrobenzene.1.2-diol dehi^ j,lr(K ™' ^"^'"^ ^°«e *nto 

vamS) Which catalyzes the reduction of dettSSS^.S^^' J ^'^''^-S-^'eta-steroid 4KJehydrogenase (EC 
synthase in the formation of 4.2-.4Mrihydrorchalc^r F^^O ^ "'"fch co-acts wS. chalcone 

function is not known. - Leishmania major pS, E prot;in ?i,Ti p^^^ T"""" ' "^^^^ 
abundance is marked^ elevated in promastigotes cori^l vi^^^^^^ ctevelopmentally regulated protein whose 
Escherichia coli hypothetical protein yafB SieS^ ^ TT"^^' """^""^ « "^t yet known - 

YBR149W. - Yeast hypotheticaUteinCRlST- Sn^ll^^^^^ P''*'^" " ^east hypothetical p^Jein 
300 am,no acid residues. Three consensuV»I^veS?nS^^^ alfabout 
The firs, one is located in the N-terminal sectKhe^ prote^rj^r T "^^"'^ '° '""^ '^"^ P'o^^ins. 
-.h.rdpanern.^.edin.eC-termina,.centeri3S^d:rl^^^^^^ 
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aldehydereductases, affect the catalytic efficiency. 

Consensus pattern: G-[FY)-R-(HSALHLIVMF]-l>{STAGCHAS]-x(5)-E-x(2HUVMJ- G- 
Consensus pattern: |LIVMFY]-x(9HKREQ]-x-{UVM]-G-[UVMHSCJ-N-(FY]- 

Consensus pattern: IUVM]4PAIV]-IKRHSTl-x(4)-R-x(2HGSTAEQKHNSL)-x(2HUVMFAl {K is a putative active site 
resldue]- 

[ 1) Bohren K.M.. Bullock 8., Wermuth B.. Gabbay K.H. J. Biol Chem. 264:9547-9551(1989). 
[ 2] Bruce N.C., Willey D.L. Coulson A.F.W., Jeffery J. Biochem. J. 299:805^11(1994). 

[0242] 49. Alpha amylase. This family is classified as family 13 of the gtycosyl hydrolases. The structure is an 8 
stranded alpha/beta barrel, interrupted by a -70 a.a calcium-binding domain protruding between beta strand 3 and 
alpha helix 3, and a carboxyl-terminat Greek key beta-barrel domain. 

[1 J Larson SB, Greenwood A. Cascb D. Day J. McPherson A. J Mol Bbl 1994;235:1560-1584. 
[0243] 50. Aminotransferases class-l pyrldoxal-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes, such as 
the covalent binding of the pyridoxal- phosphate group to a lysine residue. On the basis of sequence similarity, these 
various enzymes can be grouped [1 ,2] into subfamilies. One of these, called class-l, currently consists of the following 
enzymes: - Aspartate aminotransferase (AAT) (EC 2.6.1.1 ). AAT catalyzes the reversible transfer of the amino group 
from L-aspartate to 2-oxoglutarate to form oxatoacetale and L-glutamate. In eukaryotes, there are two AAT isozymes: 
one is located in the mitochondrial matrix, the second is cytoplasmic. In prokaryotes. only one form of AAT is found 
(gene aspC). - Tyrosine aminotransferase (EC 2.6.1.5) which catalyzes the first step in tyrosine catabolism by reversibly 
transferring its amino group to 2- oxoglutarate to form 4-hydroxyphenylpyruvate and L-glutamate. - Aronnatic ami- 
notransferase (EC 2.6.1.57 ) involved in the synthesis of Phe, Tyr, Asp and Leu (gene tyrB). - 1-aminocyclopropane- 

1 - carboxylate synthase (EC 4.4. 1.14 ) (ACC synthase) from plants. ACC synthase catalyzes the first step in ethylene 
biosynthesis. - Pseudomonas denitrificans cobC, which is involved in cobalamin biosynthesis. - Yeast hypothetical 
protein YJL060w.The sequence around the pyridoxal-phosphate attachment site of this class of enzyme is sutTiciently 
consented to allow the creation of a specific pattern. 

Consensus pattern: [GSHLIVMFYTACl-[GSTA]-K-x(2)-IGSALVN]-lUVMFA]-x-[GNARl- x-R-[LIVMAl-lGAl (K is the py- 
ridoxal-P attachment site] 

[ 1] Bairoch A. Unpublished observations (1992). 

[ 2] Sung M.H., Tanizawa K.. Tanaka H., Kuramitsu S.. Kagamiyama H., Hirotsu K., Okamoto A., i-llguchi T, Soda 
K. J. Biol. Chem. 266:2567-2572(1991). 

[0244] 51 . Aminotransferases class-ll pyridoxal-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes, such as 
the covalent binding of the pyrkioxal- phosphate group to a lysine residue. On the basis of sequence similarity, these 
various enzymes can be grouped [1] into subfamilies. One of these, called class-ll. currently consists of the following 
enzymes: - Glycine acetyttransferase (EC 2.3.1.29 ). whch catalyzes the addition of acetyl-CoA to glycine to form 

2- amino-3-oxobutanoate (gene kbi). - 5-aminolevulinic ackJ synthase (EC 2.3.1.37 ) (delta-ALA synthase), which cat- 
alyzes the first step in heme bk)synthesis via the Shemin (or C4) pathway, I.e. the addition of succinyl-CoA to glycine 
to form 5- aminolevulinate. - 8-amino-7-oxononanoate synthase (EC 2.3.1.47 ) (7-KAP synthetase), a bacterial enzyme 
(gene bioF) which catalyzes an intermediate step in the biosynthesis of biotin: the addition of 6-cart>oxy-hexanoyl-CoA 
to alanine to form 8-amino-7-oxononanoate. - Histidinol-phosphate aminotransferase (EC 2.6.1.9V which catalyzes 
the eighth step in histidine biosynthetic pathway: the transfer of an amino group from 3-(tmkiazol-4-yl)-2-oxopropyl 
phosphate to glutamc ackJtoform histidinol phosphate and 2-oxoglutarate. - Serine palmitoyltransferase (EC 2.3.1.50 ) 
from yeast (genes LCB1 and LCB2), which catalyzes the condensation of palmitoyl-CoA and serine to form 3-ketosph- 
inganine.The sequence around the pyridoxal-phosphate attachment site of this class of enzyme is sufficiently con- 
served to allow the creation of a specific pattern 

Consensus pattern: T-[LIVMFYWJ-|STAG]-K-ISAGl-(LIVMFYWRl-[SAG)-x(2)-[SAGl 

[K is the pyridoxal-P attachment site]- 

1 1] Bairoch A. Unpublished obsen^atk>ns (1991). 

[0245] 52. Aminotransferases class-Ill pyridoxal-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes, such as 
the covalent binding of the pyrkioxal- phosphate group to a lysine reskJue. On the basis of sequence similarity, these 
various enzymes can be grouped [1 .2] into subfamilies. One of these, called class-Ill, currently consists of the following 
enzymes: - Acetytornithine aminotransferase (EC 2.6.1. 11 > whfch catalyzes the transfer of an amino group from acety- 
lornithine to alpha-ketoglutarate, yielding N-acetyl-glutamic-5-semi-aldehyde and glutamic acid. - Ornithine ami- 
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OJ an amino group from GABA to alpha^cetogluta ate swiStte^L" 
dehyde and g utonc acKl - DAPA aminotransferase (EC 2.6^62). a bacterial^nzyme (gerS^^hiSt 
an .ntemiedate step m the bwsynthesis of biotin. the transamination of 7-keto*am^elargonic Sd^^toS^ 
7.8- d«m,nopete.gon« acri (DAPA). - 2,2-diallcylglycine decart^xylase (EC 4.1.1.6?^ a fC5,^«^^«tr 
2yme(genedgdA0 that catalyzes thedecarboxylatin9aminot,ansferof2.2-diaM^^ 

alanine and carbon d«xide. - Glutamate-1 -semialdehyde aminotransferase (EC 5.4.3 8) (gITSs^J mTen^!; 
.r.v^ed ,n the second step of porphyrin biosynthesis, via the C5 path««y. I, trans^i^^Fe St,o g^p ^aZTcS 
glutamate-1 -semaldehyde to the neighbouring carbon, to give delta^minolevulinic acid - BacHlus Zife ^S^llf 
r'^^ SI " aminotransferase yodT - Haemophilus influenzae a^n^^^e^e^ST" 

il 99^ Unpubtehed obsen«tions (1992).[ 2J Yonaha K.. Nishie M.. Aibara S. J. Biol. Chem. 267:12506-12510 

[02461 53. Ank repeat. There's no clear separation between noise and sional on the hmm «<«.r^i, « • 

[1] A. HolakTA, FEBS Lett 1997;401:127-132. 

[2J Lux SE» John KM, Bennett V, Nature 1990;345:736-739. 

[0247] 54. Aminotransferases class ! V signature 

' ^'S^^e'St^To imi^^; """^ "^^^ '^^^'^ °' amino group 

aZ^JTh ^ ^ ^ ^"'^^^ '° 2HOXoglutarate. to form pyruvate and D-aspartate 

consensus pattern: E-x-[STAGCIJ-x(2)-N^LIVMFAC]-[FYl-x(6.12)^LIVMF^x-T-x(6.8HLIV(«l^x-(GSHU 

[1] Green J.M.. Merkel W.K., Nichols B.P. J. BactereL 174:5317-5323(1992) 
(2] Bairoch A. Unpublished obsen/ations (1 992). 

I02S0J 55. Aminotransferases class-V pyridoxal-phosphate attachment site 

various enzymes can be grouped 11 21 into subfamilip* r>n^ .k-,- ^ . . . sequence similarity, these 

enzymes: - Phosphoserine aminoSerai fi i ' '^^J^^'^- "^^ists of the following 
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torquens. - Seme-pyruvale aminotransferase (EC 2.6.1. 51> . This enzyr^e also acts as an alanine-glyoxylate ami- 
notransferase (EC 2.6.1.44) . In vertebrates, it is located rn the peroxisomes and/or mitochondria. - Isopenicillin N 
epimerase (gene cetD). This enzyme is involved in the biosynthesis of cephalosporin antibiotics and catalyzes the 
reversible isomerization of isopenicillin N and penicillin N. - NifS. a protein of the nitrogen fixation opercn of some 
bacteria and cyanobacteria. The exact function of nif S is not yet known. A highly similar protein has been found in fungi 
(gene NFS1 or SPL1). - The small subunit of cyanobacterial soluble hydrogenase (EC 1.12.-.-). - Hypothetical protein 
ycbU from Bacillus subtilis. - Hypothetical protein YFL030w f rom yeast. The sequence around the pyridoxal-phosphate 
attachment site of this class of enzyme is sufficiently conserved to allow the creation of a specific pattern. 
Consensus pattern: [UVFYCHT]-(DGH]-[LIVI^FYACHUVMFYAl-x(2)-[GSTACHGSTAJ- [HQR]-K-x(4.6)-G-x-IGSATl- 
x-[U VMFYSAC] [K is the pyridoxal-P attachment site]- 

[ 1) Ouzounis C. Sander C. FEBS Lett. 322:159-164(1993). 
[ 2] Bairoch A. Unpublished observations (1992). 

[ 3] van der Zel A., Lam H.-M., Winkler f^.E. Nuclei Acids Res. 17:8379-8379(1989). 
[0251] 56. Annexins repeated domain signature 

Annexins [1 to 6] are a group of calcium-binding proteins that associate reversbly with membranes. They bind to 
phospholipkJ bilayers in the presence of mcromolar free cateium concentration. The binding Is specific for calcium and 
for acidic phosphollpkJs. Annexins have been claimed to be involved in cytoskeletal interactions, phosphoHpase inhi- 
bltbn, intracellular signalling, anticoagulation^ and membrane fusion. Each of these proteins consist of an N-terminal 
domain of variable length foltowed by four or eight copies of a conserved segment of sixty one residues. The repeat 
(sometimes known as an *endonexln foki) consists of five alpha-helices that are wound into a right-harKied superhelix 
[7].The proteins known to betong to the annexin family are fisted betewr. - Annexin I (Lipocortin 1) (Calpactin 2) (p35) 
(Chromobindin 9). - Annexin II (Lipocortin 2) (Calpactin 1) (Protein I) (p36) (Chromobindin 8). - Annexin III (Lipocortin 
3) (PAP-III). - Annexin IV (Lipocortin 4) (Endonexin I) (Protein II) (Chromobindin 4). - Annexin V (Lipocortin 5) (Endon- 
exin 2) (VAC-alpha) (Anchorin CI I) (PAP-I). - Annexin VI (Lipocortin 6) (Protein III) (Chromobindin 20) (p68) (p70). This 
is the only known annexin that contains 8 (instead of 4) repeats. - Annexin VII (Synexin). - Annexin Vtll (Vascular 
anticoagulant-beta) (VAC-beta). - Annexin IX from Drosophila - Annexin X from Drosophila. - Annexin XI (Cateyclin- 
associated annexin) (CAP-50). - Annex'm XII from Hydra vulgaris. - Annexin XIII (Intestine-specific annexin) (ISA).The 
signature pattern for this domain spans positions 9 to 61 of the repeatand Includes the only perfectly consen/ed residue 
(an arglnine in position 22)- 

Consensus pattern: [TGHSTV]-x(8)-(LIVMF]-x(2)-R-x(3)-IDEQNHJ-x(7)-[IFY]- x(7)-[LIVMFl-x(3)-{LIVMF]-x(11)- 
ILIVMFA]-x(2)-[LIVMF)- 

[ 1] Raynal P, Pollard H.B. Biochim. Biophys. Acta 1197:63-93(1994). 

[ 2J Barton G.J.. Newman R.H., Freemont RS.. Crumpton M.J. Eur. J. Biochem. 198:749-760(1991). 
[ 3] Burgoyne R.D., Geisow M.J. Cell Calcium 10:1-10(1989). 

[ 4] Haigler H.T.. Fitch J.M., Jones J.M.. Schlaepfer D.D. Trends Biochem. Sci. 14:48-50(1989). 

[ 5] Klee C.B. Biochemistry 27:6645-6653(1988). 

1 6) Smith RD., Moss S.E. Trends Genet 10:241-246(1994). 

( 7] Huber R.. Roemisch J., Paques E.-R EMBO J. 9:3867-3874(1990). 

1 81 Fiedler K., Simons K. Trends Biochem. Sci. 20:177-178(1995). 

[0252] 57. (arf_1 ) ADP-rlbosylation factors family signature 

ADP-rlbosylatlon factors (ARF) (1 .2,3,4) are 20 Kd GTP-binding proteins involved in protein traffrcking. They may mod- 
ulate vesicle budding and uncoating within the Golgi apparatus. ARF*s also act as alk>steric activators of cholera toxin 
ADP-ribosyltransf erase activity. They are evolutionary conserved and present in all eukaryotes. At least six forms of 
ARF are present in mammals and three in budding yeast. The ARF family also includes proteins highly related to ARF's 
but which lack the cholera toxin cofactor activity, they are collectively known as ARL's (ARF-Ilke).ARDI is a 64 Kd 
mammalian protein of unknown blotogical function that contains an ARF domain at Its C-terminal extremity. Proteins 
from the ARF family are generally included in the RAS 'superiamily' of small GTP-binding proteins [5J. but they are 
only slightly related to the other RAS proteins. They also differ from RAS proteins in that they lack cysteine reskJues 
at their C-termini and are therefore not subject to prenylation. The ARFs are N-terminally myrlstoylated (the ARLs have 
not yet been shown to be modified in such a fashion). A conserved region in the C-terminal part of ARF's and ARL's 
has been selected as a signature pattern. 

Consensus pattern: [HRQTl-x-[FYWIl-x-[UVM]-x(4)-A-x(2)-G-x(2)-ILIVMl-x(2)-lGSAJ-[LIVMFJ.x-(WKHLlVM]- 
Note: proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 'A' (P-loop) (see <PDOC00017 



AT 
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1 1J Boman A.L. Kahn RA. Trends Biochem. ScL 20:147-150(1995). 

[ 2J Moss J.. Nfeughan M. Cell. Signal. 4.367-399(1993). 

( 3] Moss J., Vfeughan M. Prog. Nucleic Add Res. Mol. Biol. 45:47-65(1993). 

[4) Amor J.C., Harrisoa D.H., Kahn R.A.. Ringe D. Nature 372:704-708(1994). 

[5] VSalencia A. Chardin R, Wittinghofer A, Sander C. Biochemistry 30:4637-4648(1991). 

[02531 (arf_2) ATP/GTP-blnding site motif A (P-k)op) 

From sequence comparisons and crystaltographic data analysis it has been shown [1.2,3.4.5.6) that an appreciable 
proportion of proteins that bind ATP or GTP share a number of more or less conserved sequence motifs The best 
conserved of these motifs Is a glycine-rich region, which typcally fomis a flexible loop between a beta-strand and an 
alpha-hehx. This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is generally 
referred to as the 'A' consensus sequence {1 J or the 'P-Ioop' (SJ.There are numerous ATP- or GTP-binding proteins in 
which the P-toop IS found. A number of protein families for which the relevance of the presence of such motif has been 
noted are listed below: - ATP synthase alpha and beta subunits (see <PDOC00137 >V - Myosin heavy chains - Kinesin 
^T^r!t^^l n ^''^^'''^^^ <PDOC00343>). - Dynamhs and dynamin-like proteins (see 

<PDOCP0362>). - Guanylate kinase (see <PDQC00670>). - ThymWine kinase (see <PDOC005 24>^ - Thvmidvlate 
kinase (see <PDOC01034 >). - Shikimate kinase (see <PDgC00868>). - Nitrogenase iron protein family (nifHArxC) 
(see <PDOC00580 >). - ATP-binding proteins involved in 'active transporf (ABC transporters) [7] (see <PDOC001 a5>) 
- DNAand RNAhelicases |8.9,10J. - GTP-binding efongation factors (EF-Tu, EF-lalpha. EF-G EF-2 etc) -Rasfamilv 
of GTP-binding proteins (Ras. Rho, Rab. Ral. Yptl. SEC4. etc.). - Nuclear protein ran (see <PDOCT0859>) - ADP- 
nbosy lation factors family (see <PDOC007B1>). - Bacterial dnaA protein (see <PDOC0077l> V - ro^A protein 

(see <PDOC00131 >). - Bacterial recF protein (see <PDQC00539>). . Guanine nucleotide-binding proteins alpha sub- 
units (Gi. Gs. Gt. GO, etc.). - DNA mismatch repair proteins mutS family (See <PDOC00388>) - Bacterial tvoe II 
secretfon system protein E (see <PDOC00567>).Not all ATP- or GTP-binding proteins are picked^up by this moirf A 
nurriber of proteins escape detection because the structure of their ATP-binding site is completely different from that 
of the P^oop. Examples of such proteins are the E 1 -E2 ATPases or the glycolytic kinases. In other ATP-or GTP-bindino 
proteins the flexible loop exists in a slightly different form; this is the case for tubulins or protein kinases A special 
mention must be reserved for adenylate kinase, in whk:h there is a single deviation from the P^oop pattern' in the last 
position Gly is found Instead of Ser or Thr. • " woi 

Consensus pattern: [AG]-x(4)-G-K-fST]- 

[ 1] Walker J.E., Saraste M., Runswick M.J., Gay N.J. EMBO J. 1:945-951(1982) 
( 2] Moller W.. Amons R. FEBS Lett. 186:1-7(1985). 

[ 3J Fry D.C., Kuby S.A., Mlldvan AS. Proc. Natl. Acad. Sci. U.S. A 83:907-911(1986). 
[ 4) Dever TE.. Glynias M.J., Merrick W.C. Proc. Natl. Acad. Sci. US.A 84:1814-1818(1987) 
[ 5] Saraste M.. Sibbald PR, Wittinghofer A Trends Bkxhem. Sci. 15:430-434(1990) 
[ 6) Koonin E.V. J. MoL Biol. 229:1165-1174(1993). 

^i\^9SilS>) '^ ^"^^ ^"^^""^ ^'^'^ ^' ^^"^^9- Blomembr. 22: 

I 81 Hodgman TC. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata) 

Ifi^m"^^' ^' Ashburner M.. Leroy R. Nielsen P. J.. Nishi K.. Schnier J.. SlonimskiRP Nature 337:121 -122 

(1 989). 

(10J Gorbalenya A.E.. Koonin E.V.. Donchenko A.P.. Blinov V.M. Nucleic Acids Res. 17:4713-4730(1989). 
[0254] 58. Arginase family signatures 

The followrng enzymes have been shown [1] to be evolutionary related: - Arginase (EC 3.5.3.1V a ubiquitous enzyme 
wh«h catalyzes the degradation ci arginine to ornithine and urea [2]. - Agmatinase (EC 3.S.3.11> (agmatine ureohy- 
drolase). a prokaryotK enzyme (gene speB) that catalyzes the hydrolysis of agmatine intT^IS^iicine and urea - 
Formiminoglutamase (EC asjj) (formimlnoglutamate hydrolase), a prokaryotic enzyme (gene hutG) that hydrolyzes 
^^form.mlno-glutamate into glutamate and formamide. - Hypothetical proteins from methanogenic archaebacteria 
These enzymes are proteins of about 300 amino-acid residues. Three consented regions that contain charged residues 
which are involved in the binding of the two manganese ions (3J can be used as signature patterns - 
Consensus pattern: fUVMF]-<3-G-x-H-x-(LIVIWITHSTAVJ-x4PAGhx(3HGSTAI [H binds manganesel- 
Consensus pattern: [UVMl(2)-x-|LIVMFY]-D-lAShH-x-D [The two D's and the H bind manganesel- 
Consensus pattern: (STHUVMFY)-D-[UVMJ.D-x(3)-IPAQJ-x(3)-P-[6SAJ-x(7)-G (The two D's bind manganese) 

[ 1 J Ouzounis C. Kyrpides N.C. J, Mol. Evol. 39:101-104(1994). 

1 2) Jenkinson CP, Grody W.W.. Cederbaum S.D. Comp. Biochem. Physiol. 1148:107-132(196). 
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[ 3J Kanyo 2.F., Scolnick LR.. Ash D.E., Christiansen D.W. Nature 383:554-557(1 996). 
[0255] 59. (asp) Eukaryotic and viral aspartyl proteases active site 

Aspartyl proteases, also known as acid proteases. (EC 3.4.23-) are a widely distributed family of proteolylk: enzymes 
[1 ,2,3} known to exist invertebrates, fungi, plants, retroviruses and some plant viruses. Aspartate proteases of eukary- 
otes are nrionomerk: enzymes whk:h consist of two domains. Each domain contains an active site centered on a catalytic 
aspartyl residue.The two domains most probably evolved from the duplication of an ancestral gene encoding a primor- 
dial domain. Currently known eukaryotic aspartyl proteases are: - Vertebrate gastric pepsins A and C (also known as 
gastrlcsin). - Vertebrate chymosin (rennin), involved in digestion and used for making cheese. - Vertebrate lysosomal 
cathepsins D (EC 3.4.23.5) and E (EC 34.23.34 ). - Mammalian renin (EC 3.4.2315) whose function is to generate 
angtotensin I from angiotensinogen in the plasma. - Fungal proteases such as aspergiltopepsin A (EC 3.4.23.18) . 
candidapepsin (EC 3.4.2324 ). mucoropepsin (EC 34.23.23) (mucor rennin), endothiapepsin (EC 3.4.23.22 ). polypo^ 
ropepsin (EC 34.2329 ). and rhizopuspepsin (EC 3.4.2321 ). - Yeast saccharopepsin (EC 34.23.25 ) (proteinase A) 
(gene PEP4). PEP4 is implicated in posttranslational regulation of vacuolar hydrolases. - Yeast barrier pepsin (EC 
3.4.23.35) (gene BAR 1 ); a protease that cleaves alpha-factor and thus acts as an antagonist of the mating pheromone. 
- Fission yeast sxal which is involved in degrading or processing the mating pheromones. Most retroviruses and some 
plant viruses, such as badnavinjses, encode for anaspartyl protease which is an homodimer <rf a chain of about 95 to 
125 amino acids. In most retroviruses, the protease is encoded as a segment of apolyprotein which is cleaved during 
the maturation process of the virus. It is generally part of the pel polyprotein and, more rarely, of the gagpolyprotein. 
Conservation of the sequence around the two aspartates of eukaryotic aspartyl proteases and around the single active 
site of the viral proteases allows us to develop a single signature pattern for both groups of protease. 
Consensus pattern: ILIVMFGAC]-[UVMTADNHLIVFSA]-D-[ST]-G-[STAV]-[STAPDENQ]- x-[UVMFSTNC]-x-[UVMF- 
GTA) [D is the active site reskluel Note: these proteins belong to famiBes A1 and A2 In the classification of Deotidases 
[4.E1 

[ 1J Foltmann B. Essays Biochem. 17:52-84(1981). 

[ 2] Davies D.R. Annu. Rev. Bbphys. Chem. 19:189-215(1990). 

[ 3] Rao J.K.M., Ericksc,i J.W., Wtodawer A. Biochemistry 30:4663-4671(1991). 

[4] RawHngs N.D.. Barrett A. J. Meth. Enzymol. 248:105-120(1995). 

[0256] 60. (BIRA) Biotin repressor 

[1] Wilson KP, Shewchuk LM, Brennan RG, Otsuka AJ, Matthews BW; Proc Natl Acad Sci USA 1992-89-9257-9261 
[0257] 61. BTB/POZ domain 

The BTB (for BR-C. ttk and bab) [1] or P02 (for Pox vims and Zinc finger)[2] domain is present near the N-terminus 
of a fraction of zinc finger 

(zf-C2H2 ) proteins and in proteins that contain the Kejch nxjtif 

such as Keteh and a family of pox virus proteins. The BTB/POZ domain mediates homomeric dimerisation and in some 
instances heteromeric dimerisatbn [2].The structure of the dimerised PLZF BTB/POZ domain has been solved and 
consists of a tightly intertwined homodimer. The central scaffoWing of the protein is made up of a cluster of alpha- 
helices flanked by short beta-sheets at both the top and bottom of the molecule [3]. POZ domains from several zinc 
finger proteins have been shown to mediate transcriptional repressbn and to interact with components of histone 
deacetylase co-repressor complexes including N-CoR and SMRT [4,5,6]. The POZ or BTB domain is also known as 
BR-C/Ttk or ZiN 

[1] Zollnrtan S, Godt D. Prive GG, Couderc JL, Laski FA; Proc Natl Acad Sci U S A 1994;91:10717-10721. 
l2]Bardwell VJ, Treisman R; Genes Dev 1994;8:1664-1677. 

[3] Ahmad KF. Engel CK, Prive GG; Proc Natl Acad Sci U S A 1998;95:12123-12128. 

14) Deweindt C, Albagli O. Bernardin F. Dhordain P, Ouief S, Untoine D, Kerckaert JP, Leprince D' Cell Growth 
Differ 1995;6:1495-1503 

[5] Huynh KD, Bardwell VJ; Oncogene 1998:17:2473-2484. 

[6] Wong CW, Privalsky ML; J Biol Chem 1998;273:27695-27702. 

[0258] 62. (Bac GSPproteins) Bacterial type II secretion system protein D signature 

A number of bacterial proteins, some of which are involved in a general secretion pathway (GSP) lor the export of 
proteins (also called the type II pathway) [1 to 5). have been found to be evolutionary related. These proteins are listed 
below: - The 'D* protein from the GSP operon of: Aeromonas (gene exeD); Enwinia (gene outD); Escherk;hia coli (gene 
yheF). Klebsiella pneumoniae (gene pulD); Pseudomonas aenjginosa (gene xcpQ); Vibrk> cholerae (gene epsD) and 
Xanthomonas campestris (gene xpsD). - comE from Haemophilus influenzae, involved in competence (DNA uptake). 
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- pilQ from Pseutkxnonas aeruginosa, which is essential for the formation of the pili. - hofQ (hopQ) from Escherichia 
coli. - hrpH from Pseudomonas syringae. which is involved in the secretion of a proteinaceous elicitor of the hypersen- 
sitivity response in plants. - hrpAl from Xanlhomonas campestris pv. vesicatoria. which is also involved in the hyper- 
sensitivity response. - mxiD from Shigella flexneri which is involved in the secretion of the Ipa invasins which are 
necessary for penetration of intestinal epithelial cells. - omc from Neisseria gonorrhoeae. - yssC from Yersinia entero- 
colitica virulence plasmid pYV, virtiich seems to be required for the export of the Yop virulence proteins. - The gpIV 
protein from filamentous phages such as f1. ike, or ml 3. GpIV is said to be involved in phage assembly and morpho- 
genesis. These proteins ail seem to start with a signal sequence and are thought to be integral proteins in the outer 
membrane. As a signature pattem a conserved region m the C-lerminal section of these proteins has been selected 
Consensus pattem: [GRHDEQKGHSTVN/0-[UVI^](3HGAJ-G4U\^FYhx(ll HUVMJ-P4U 
[GSAE]-x4LIVMl-P-[UVMFYW](2)-x(2)-[LV]-F 

[ 1) Salmond G.P.C.. Reeves P.J. Trends Biochem. Sci. 18:7-12(1993). 

[ 2) Reeves RJ.. Whitcombe D., Wharam S.. Gibson M,. Allison G., Bunce N., Barallon R, Douglas R, Mulholland 

v., Stevens S.. Walker S., Salmond G.P.G. MoL Microbiol. 8:443-456(1993). 

[ 3) Martin RR., Hobbs M.. Free RD.. Jeske Y., Mattk:k J.S. Mol. Mk:robk>l. 9:857-868(1993). 

[ 4] Hobbs M., Mattick J.S. MoL Microbk>L 10:233-243(1993). 

[ 5] Genin S.. Boucher C.A. Mol. Gen. Genet 243:112-118(1994). 

[0259] 63. (Bac globin) Protozoan/cyanobacterial globms signature 

Gtobins are heme-containrng proteins involved in binding andtor transporting oxygen [1 J. Almost all globlns belong to 
a large family (see <PDOC0079a>), the only exceptions are the folkwing proteins which form a family of their own 
[2,3): - Monomeric hemoglobins from the protozoan Paramecium caudatum, Tetrahymena pyriformis and Tetrahymena 
thermophila. - Cyanoglobin from the cyanobacteria Nostoc commune. - Gtobins L1637 and U410 from the chloroplast 
of the alga Chlamydomonas eugametos. - Mycobacterium tubercutosis hypothetk^l protein MtGY48.23. These proteins 
contain a consented histidine which could be involved in heme-binding. As a signature pattem. a consented regbn 
that ends with this reskJue was used 

Consensus pattem: F-[LFJ-x(5)-G-[PAJ-x(4)-G-[KRA]-x-[LIVMJ-x(3)-H- 

[ 1] Concise Encyctopedia Biochemistry, Second Edition, Walter de Gmyter, Berlin New-York (1988). 
[ 2] Takagi T Curr. Opin. Struct. Biol. 3:413-418(1993). 

[ 3} Couture M., Chamberland H.. St-Pierre B., Lafontaine J.. Guertin M.; MoL Gen. Genet. 243:185-197(1994). 
[0260] 64. Band 7 protein family signature 

Mammalian band 7 protein [1] (also known as 7.2B or stomatin) is an integral membrane phosphoprotein of red bkxxJ 
cells thought to regulate catkxi conductance by interacting with other proteins of the junctional complex of the mem- 
brane skeleton. Structurally, band 7 is evolutionary related to the following proteins: - Caenorhabditis elegans protein 
mec-2 [2]. Mec-2 positively regulates the activity of the putative mechanosensory transduction channel. It may links 
the niechanosensory channel and the mcrotubule cytoskeleton of the touch receptor neurons. - Caenorhabditis elegans 
proteins sto-1 to sto-4. - Caenorhabditis elegans protein unc-1. - Escherichia coli hypothetical protein ybbK. - Myco- 
bacterium tuberculosis hypothetical protein MtCY277.09. - Synechocystis strain PCC 6803 hypothetical protein sirll 28. 

- Methanococcus jannaschii hypothetical protein MJ0827.StructuralIy all these proteins consist of a short N-terminal 
domain which is followed by a transmembrane region and a variable size (from 1 70 to 350residues) C-terminal domain . 
As a signature pattem, a conserved region k)cated about llOresidues after the transmembrane domain was selected 
Consensus pattern: R-x(2)-(UV]-[SAN]-x(6)-[LIV]-D-x(2)-T-x(2)-W-G-[UV]- [KRH]-[LTV]-x-[KRl-ILIV]-E-[UV]-(KR]- 

[ 11 Gallagher RG., Forget B.G. J. Biol. Chem. 270:26358-26363(1995) . 
[ 2) Huang M.. Gu G., Ferguson E.L. Chalfie M. Nature 378:292-295(1995). 

[0261] 65. Barwin domain signatures 

Banvin [1] is a barley seed protein of 125 residues that binds weakly a chitinanafog. It contains six cysteines involved 
in disulfide bonds, as shown in the following schematic representatbn. 
+ 

xxxxxxxxxxxxxxxCxxxxxxxxxxCxxxxCxCxxxxxxxxCxxxxxxxxxxxxxxxxxxCxllll+ — 

^'Q.. conserved cysteine involved in a disulfide t>ond;**: position of the patterns. Barwin 

is closely related to the foltowing proteins: - Hevein. a wound-induced protein found in the latex of mbber trees. - HEL. 
an Arabfctopsis thaliana hevein-like protein [2]. - Wini and win2. two wound-induced proteins from potato. - Pathogen- 
esis-related protein 4 from tobacco. Hevein and the win1/2 proteins consist of an N4erminal chitin-bindtng domain 
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folbwed by a barwin-like C-terminal domain. Barwin and its related proteins could be involved in a defense mechanism 
in plants. As signature patterns, two highly consented regions that contain some of the cysteines were selected 
Consensus pattern: C-G-(KR]-C-L-x-V-x-N (The two C*s are involved in disulfide bonds]- 
Consensus pattern: V-[DNJ-Y-[EQ1-F-V-[DN)-C [C is involved in a disulfide bond]- 

5 

{ 1] Svensson B.. Svendsen L, Hoejrup R. Roepstortf P., Ludvigsen S.. Poulsen F.M Biochemistry 31:8767-8770 
(1992). 

1 2] Potter S.. Uknes S., Lawton K.. Waiter A.M.. Chandler D.. DBTiaio J.. Novitzky R., Ward E.. Ryals J. Mol. Plant 
Microbe Interact. 6:680-685(1993). 

10 

[0262] 66. (Bowman-Birk leg) Bowman-Btrk serine protease inhibitors family signature 

PROSITE cross-ref erence(s). The Bowman-Birk inhibitor family [1] is one of the numerous families of serine proteinase 
inhibitors. As it can be seen in the schemata representation, they have a dupik:ated structure and generally possess 
two distinct inhibitory sites: 
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< 70 residues > 

30 'C: conserved cysteine involved in a disulfide bond. 

'#': active site residue. 
** *: position of the pattern. 

35 [0263] These inhibitors are found in the seeds of all leguminous plants as well as in cereal grains. In cereals they 
exist in two forms, one of which is a duplicatkxi of the basic structure shown above [2J. The pattern that was developed 
to pick up sequences belonging to this family of inhibitors is in the central part of the domain and includes four cysteines. 
[0264] Consensus pattern C-x(5,6)-[DENQKRHSTA]-C-[PASTDH)-[PASTDK]>[ASTDV]-C-[NDKS]-[DEKRHSTA)-C 
[The four C's are involved in disulfide bonds] Note this pattern can be found twice in some duplicated cereal inhibitors. 



[ 1J Laskowski f^.. Kato I. Annu. Rev Biochem. 49:593-626(1980). 

( 21 Tashiro M.. Hashino K.. Shiozaki M., Ibuki F.. MakI Z. J. Bkx:hem. 102:297-306(1987). 

[0265] 67. Pathogenesis-related protein Bet v I family signature 

[0266] A number of plant proteins, which all seem to be involved in pathogen defense response, are structurally 
related [1.2,3]. These proteins are: 



Bet V I, the major pollen allergen from white birch. Bet v I is the main cause of type I allergic reactions in Europe, 

North America and USSR. 
50 - AIn g I, the major pollen allergen from alder. 

Api G I, the major allergen from celery. 

Car b I, the major pollen allergen from hornbeam. 

Cor a I, the major pollen allergen from hazel. 

Mai d I, the noajor pollen allergen from apple. 
55 - Asparagus wound- induced protein AoPRI. 

Kidney bean pathogenesis-related proteins 1 and 2. 

Parsley pathogenesis-related proteins PRI-l and PR1-3. 

Pea disease resistance response proteins pl49, pi 176 and DRRG4g-C. 
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Pea abscisic acid-responsive proteins ABR17 and ABR18 
Potato pathogenesis-related proteins STH-2 and STH-21 
Soybean stress-induced protein SAA422. 



' - a^^^rrera::.^ r;,^^^ '—so a^no ac« rescues. 

• 'j'^^t^Z^r^'^^^^ ^«'-0.. Breitenbach EMBO 

fal SZ!" c i'^J^ ° - f' M- Pfa"» Mol. Biol. 18 459^(ioa2J 

13] Wtemer S.AJ.. Scott R.. Draper J. Pbm Mol. Biol. 1 9:555-561 (199^ 

^?7iD^' ''^^ ♦'inscription factors basic domain signature 

family is quite brge. therefore only a pall fef^S^^SSlStl! ? '"^"'"^ dimerization. This 
AP-1. Which binds selective^ to enh^efelS^te rthe 0^.!^^ members appears here. - Transcription factor 
also known as c-jun. Is the cellular I^.^ ^ i^^^^^^^ metallothionein IIA. AP-i . 

D. probable transcription factors which are highly sfr^te" toSZTVl ^ „ "^""^ ' ^ 
non^»valent dimer with c-jun. - The fos-relati pr,^eT.s 1 H^fi r J ^ P'««»«»9ene that tom« a 

binding proteins CREB, CREI^. ATF-t. A^S 5 ^T^ Vh^"^ ^"^^ 

transcriptional activator involved in the regulation of the modnrtiL^f, ' ^^^^ ^^^^"^ 2, a trans-acting 

G-box binding factors GBFl ,0 GBF4. PaSTc^Rp 1 trcpS^.^^T^^^ """"rZ ""'"^ ' 
proteins bind the G-box promoter elements Ja^yoZ oZ! ^^^^ EMBP-I. AB these 

expression of both the knrppel and l^irpTseamTSion^n " " ^""'"'^ the 
transcriptional activator tha?Lds u!^:^Te^ZTn^^,^J:':i ^ ^ 2 (BBF-2). a 

Drosophila segmentation protein cap-n'S (^neZS^l u^^l^^ ^ P'***"'" 9^"^^ " 
elegans skn-1. a developmental protein involved , head morphogenesis. - Caenorhabditis 

transcriptbn facto, a component of the ^^tZ lZlZT "^'^'^ «arly embryo. - Yeast GCfg4 
enzymes in response to amino acid st^^S^ ^ZS^^^T ^ °' ^cid-synthesizing 

cys-3 Which turns on the expressionT^rSuml otneTwH . "'T*"^ '^"^ P™'^'"- " Ne"rospora crassa 
transcriptional activator o, su'.ur a"no al^ mlrsr-'vi^TD^^^^^^^^^ " ^^^^e. a 

genes for some oxygen detoxification enzymes - EdsiZ 831^^.!??^ (or YAP1), a tianscnptional activator of the 
consensus pattern: [KR]-x(1.3HRKSAQ?N-x(2hI;SK^^ 

- Pyruvate carboxylase (EC 6.4. 1 . i ). 

- Acetyl-CoA carboxylase (EC 6.4. 1 .2). 

- Proplonyl-CoA carboxylase (EC 6.4,1 .3). 

- Methylcrotonoyl-CoA carboxylase (EC 6.4. 1 4) 

- Geranoyl-CoA carboxylase (EC 6.4.1.5). 

- Urea carboxylase (EC 6.3.4.6). 

- Oxaloacetate decarboxylase (EC 4. l . i .3) 

- Methylnialonyl-CoA decarboxylase (EC 4. 1 . i .41 ) 

- Glutaconyl-CoA decarboxylase (EC 4 1 i 70) 

[ 1) Knowles J.R Annu. Rev. Biochem. 58:195-221(1989) 

1 2] Samols D.. Thronton C.G.. Murtif V.L., Kumar G.K.. Haase FC Wood H r i p , 

. nddse i-.u., Wood H.G. J. Biol. Chem. 263:6461-6464 
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(1988). 

[ 3] Goss N.H., Wood H.G. Meth. EnzymoL 107:261-278(1984). 

[ 41 Shenoy B.C., Xie Y. Park V.L, KumarG.K., Beegen H., Wood H.G., Samols D. J. Biol. Chem. 267:18407-18412 
(1992). 

[0271] 2-oxo acid dehydrogenases acyltransf erase component lipoyi binding site 

The 2-oxo acid dehydrogenase muttienzyrr)e complexes [1 »2] from bacterial and eukaryotic sources catalyze the oxi- 
dative decarboxylation of 2-oxo acids to the corresponding acyl-CoA. The three members of this family of muttienzyme 
complexes are: 

Pyruvate dehydrogenase complex (PDC). 
2-oxoglutarate dehydrogenase complex (OGDC). 
Branched-chain 2-oxo acid dehydrogenase complex (BCOADC). 

These three complexes share a common architecture: they are composed of multiple copies of three component en- 
zymes - El , E2 and E3. El is a thiamine pyrophosphate-dependent 2-oxo acid dehydrogenase, E2 a dihydrolipamide 
acyltransferase, and E3 an FAD-containing dihydrolipamide dehydrogenase. 

[0272] E2 acyltransf erases have an essential cofactor, iipoic acid, which is covalently bound via a amide linkage to 
a lysine group. The E2 components of CX^CD and BCOACD bind a single lipoyi group, while those of PDC bind either 
one (in yeast and in Bacillus), two (in mammals), or three (in Azotobacter and in Escherk:hia coli) lipoyi groups [3]. 
In additbn to the E2 components of the three enzymatic complexes described above, a iipoic acid cofactor is also 
found in the folbwing proteins: 

H-protein of the glycine cleavage system (GCS) [4). GCS is a muttienzyme complex of four protein components, 

which catalyzes the degradatk>n of glycine. H protein shuttles the methylamine group of glycine from the P protein 

to the T protein. H-protein from either prokaiyotes or eukaryotes binds a single Iipoic group. 

Mammalian and yeast pyruvate dehydrogenase complexes differ from that of other sources, in that they contain, 

in small amounts, a protein of unknown functkxi - designated protein X or component X Its sequence is cbsety 

related to that of E2 subunits and seems to bind a Iipoic group [5). 

Fast migrating protein (FMP) (gene acoC) from Alcaligenes eutrophus [6). 

This protein is most probably a dihydrolipamkJe acyltransf erase involved in acetoin metabolism. 

A signature pattern was developed which allows the detection of the lipoyl-binding site. 

[0273] Consensus pattem[GN)-x(2)-[LIVFl-x(5)-[LIVFC]-x(2)-ILlVFA]-x{3)-K-[STAIVl-[STAVQDN]-x(2)-[LIVMFS]-x 
(5)-[GCN]-x-[LJVMFY] [K is the lipoyl-binding site] Note the domain around the lipoyl-binding lysine residue is evolu- 
tionary related to that around the biotin-binding lysine residue of biotin requiring enzymes 

[ 1] Yeaman S.J. Bkx:hem. J. 257:625-632(1989). 

1 2] Yeaman S.J. Trends Bk)chem. Sci. 11:293-296(1986). 

[ 3] Russel G.C.. Guest J.R. Biochim. Biophys. Acta 1076:225-232(1991). 

[ 4J Fujiwara K., Okamura-lkeda K., Motokawa Y J. Biol. Chem. 261:8836-8841(1986). 

[ 5] Behal RH., Browning K.S.. Hall TB., Reed LJ. Proc. Natl. Acad Sci. U.S.A. 86:8732-8736(1989). 

{ 6] Priefert H., Hein S., Krueger N., Zeh K., Schmidt B., Steinbuechel A. J. Bacterkrf. 173:4056-4071(1991). 

[0274] 70. C2 (C2 domain) Number of members: 295 

Some isozymes of protein kinase C (PKC) [1,2] contain a domain, known as C2, of about 116 amino-acid residues 
which is located between the two copies of the CI domain (that bind phorbol esters and diacylglycerol) (see 
<PDOC00379>) and the protein kinase catalytic domain (see <PEXX00100>). Regions with signifrcant homok>gy 
[3.E1] to the C2-domain have been found in the folbwtng proteins: 

PKC isoforms alpha, beta and gamma and Drosophila isoforms PKC1 and PKC2. 

PKC isoforms delta, epsiton and eta, Caenorhabditis elegans kin-13 and yeast PKC1 have a C2-like domain at 
the N-terminal extremity [4]. 

Yeast cAMP dependent protein kinase SCH9 contains a C2-like domain. 
- Mammalian phosphatidylinositol-specific phospholipase C (PI-PLC) (see <PDOC50007>) isoforms beta, gamma 
and delta as well as several non-mammalian PI-PLCs have a C2-like domain C-terminal of the catalytic domain. 
Marrnialian and plants phosphatidylinositol-3-kinase have a C2-like domain in the central region of the 110 Kd 
catalytic subunit. 
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may have a regubtory role in the nZb^^,^^:^^,^^'^^ ^ ^'^'^"^ phospholipids and that 
o. tt,e synapse. All isofom« of synaZ^^J^ S^es LT^e S ^r''!'^''^ ''^ 

- Rabphain-3A. a synaptic protein conteins>wo CaXS^s " ^^^^ 

■ 2?:sss'*,srs;"r::s;::."^'"^ 

- "^^^^AP and me breaJqxwnt cluster protein bcr havft a r9-H«f»^ - . - 

contain a C2 domain. ^ domam at the C-lerm«us. It is the onV extracellular protein known to 

- Yeast hypothetical protein YML072C has a C2 domain 

- Yeast hypothetical protein YNL087W has three C2 domains 

" ^^^S^''^ elegans hypothetical protein F37A4.7 has two C2 domains 

e.g.bind^gtoinosi.oM.3A5-teCrp^l*r,^:„^tZ^^^^^^^ 
.«..e.ween.e..rar.s2L3..p^^rre:-S;-^^^^^^^ 

sensH^e tl^ the panern. yo:s;::rserr4r^^^^^ 
So'IKrpeS™^^^^^ 

! 2I Sl^'s^T*^ ° ' ^ 208:547-557(1992) 

1 2J Stabel S. Semm. Cancer Biol. 5:277-284(1 994) 

Bergnuis A M.. Suedhot T.C.. Sprang S.R. Cell 80:929-938(1995). 
f°^r^- ^^'^ o' fnembers- 11 

the C-,ermhal domain is less cleTbut H^rS^ifto'ri^l^.:?"^^^^^^^^ 

'Sr^TjS^ >^-ru ;^„o S^^^^ I^J- OAP 

?rSr Xrn^^ °' ~s separated by a proline-r.h hinge 

tem,inal region have been deveCd '^^"^ *" ^'^^'"'^ "^e olher to a cl 

• Consensus pattern: [UVMl{2)-x-R-L-IDEhx(4)-R-L-E 

- Consensus pattern: 0-{L!VMFYJ.x-E-x-IPAI-x-P-E-Q-[UVMFY]-K 

10277] 72. CAP.GLY (CAP-Gly domain) 

CAP stands (or cytoskeleton-associated proteins Swte^-Pioo'w ™ k 
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[0278] It has been shown [1] that some cytoskelelon-associated proteins (CAP) share the presence ot a conserved, 
gtycine-rich domain of about 42 residues, catted here CAP-Gly. Proteins known to contain this donnain are listed below. 

- Restin (also known as cytoplasmic linker protein-1 70 or CLIP-1 70), a 1 60 Kd protein associated with intermediate 
filaments and that links endocytk: vescles to microtubules. Resttn contains two copies of the CAP*Gly domain. 

- Vertebrate dynactin (150 Kd dynein-associated polypeptide; DAP) and Drosophila glued, a major component of 
activator I, a 20S polypeptide complex that stimulates dynein-mediated vesicle transport. 

Yeast protein BIKI which seems to be required for the formation or stabilization of microtubules during mitosis and 
for spindle pole body f usbn during conjugation. 

- Yeast protein N1P100 (NIP80). 

Human protein CKAP1/TFCB, Schizosaccharomyces pombe protein alpll and Caenorhabditis elegans hypothet- 
ical protein F53F4.3. These proteins contain a N-lerminal ubiquitin domain (see <PCX)C00271>) and a C-tenminal 
CAP-Gly domain. 

- Caenorhabditis elegans hypothetical protein M01A8.2. 
Yeast hypothetk^l protein YNL1 48c. 

Structurally, these proteins are made of three distinct parts: an N-tenminal section that Is nnost probably gtobular and 
contains the CAP-Gly domain, a large centraJ region predated to be in an alpha-helical coiled-coll conformatkxi and. 
finally, a short C-terminal globular domain. The signature for the CAP-Gly domain corresponds to the first 32 residues 
of the domain and includes five of the six conserved glycines. 

- Consensus pattern: G-x(8.10)4FYWJ-x-G-[LIVM]-x-[UVMFYJ-x(4)-G-K-(NH]-x-G-[STARl-x(2)-G-x(2)-ILY]-F 

[ 1] RIehemann K.. Sorg C. Trends Bkxhem. Sci. 18:82-83(1993). 
[0279] 73 (CBD 1) 
Cellubse-binding domain, fungal type 

The mk:robial degradatbn of cellulose and xylans requires several types of enzymes such as endoglucanases (EC 
3.2.1.4), cellobfohydrolases (EC 3.2.1.91) (exoglucanases), orxylanases (EC 3.2.1.8) [1). 

[0280] Structurally, cellulases and xylanases generally consist of a calalytrc donr^ain joined to a cellulose-binding 
domain (CBD) by a short linker sequence rich In proline and/or hydroxy-amino acids. 

[0281] The CBD of a number of fungal cellulases has been shown to consist of 36 amino ackJ reskiues. Enzymes 
known to contain such a domain are: 

Endoglucanase I (gene egll ) from Trichodenma reesei. 
Endoglucanase II (gene egl2) from Trk:hoderma reesei. 
Endoglucanase V (gene egl5) from Trlchoderma reesei. 

Exocelbblohydrolase I (gene CBHI) from Humk:ola grisea. Neurospora crassa. Phanerochaete chrysosporiurr 
Trchoderma ree<el, and Trlchoderma viride. 

- Exoceltobk)hydrolase tl (gene CBHIl) from Trk:hoderma reesei. 
Exocelbbbhydrolase 3 (gene cel3) from Agark:us bisporus 
Endoglucanases B, C2. F and K from Fusarium oxysporum. 

[0282] The CBD domain is found either at the N-temninal (Cbh-ll or egt2) or at the C-terminal extremity (Cbh-I, egll 
or eg1 5) of these enzymes. As it is shown In the following schematic representation, there are four conserved cysteines 
in this type of CBD domain, all involved in disulfide bonds. 

+ + 

I + 

III! 

xxxxxxxCxxxxxxxxxxCxxxxxCxxxxxxxxxCx 

'C: conserved cysteine involved in a disulfide bond, 
position of the pattern. 



[0283] Such a domain has also been found in a putative polysaccharide binding protein from the red alga, Porphyra 
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( 1) Gilkes N.R., Henrissat B.. Kilbum D.6 Miner RC ir ufcirro,.DA ■ w ..• . ^ 

[ 2J Uu a. der Meer J.P.. Reith M.E. ' ^^^^ 55:303-315(1991). 

I028q 74. CBS domain. 3D Structure found as a subdomain In TIM barrel of inosine- rR.«j h • k 

doma«s are srna,, Intracellular modules r,«stly found in 2 or four cop^SSnTpreh S^ZI!^! T 
cystath«n.ne-beta-synthase (CBS) where mutations lead to homocSrurr^I^ rS ^ ^ '"""^ 
monophosphate dehydrogenase from all species howvw tt,T?^s Z^ll? ^ '^^'"^ 
domains are found in Intracellular loops of s«/e3 M^T^ ^ "'^ "^^^ ^^^'^"V Two CBS 

to homocystinuria. Number S rSrS.MiT ''"'^'^ this domain of Swiss:P35520 lead 

isrr u^pSf^roScTS^s:^^^ "^r ^'^•^ ^ 

PR. Murcko MA. WlZj^S^I sl^SsSi ^^j''^' '"^"^ °' ^^"^"^ ^A. Chambers SR Caron 

Discovery of CBS domain. 

IMilSjl «-COP<»H.Pj^,CDP.alM„apl«>stl»tldyl,mstos,, 

• ElhanoteiMncptiosphoiiBnsferase (EC 2.7.8 ,) l<m y.M (gane EPTtl 

■ SSSS?,*""""?^'""'""^'"''' '^"^ 2) t«»n (seoi CPT,) 

- Consensus pattern: D-G-x(2)-A-R.x(8)-G-x(3)-D-x(3)-D 

Biochem 1996 241:489-497 nauKsson JB, Rilfors L. Arvidson G. Undblom G; Eur J 



Biochem 1996;241:489-497 

[ 1J Nikawa J.-l, Kodaki T. Yamashita S. 
J. Biol. Chem. 262:4876-4881(1987). 
1 2) Hjelmstad R.H.. Bell P.M. 
J. Biol. Chem. 266:5094-5134(1991). 



EP 1 033 405 A2 

32:11507-11515. 

[0289] The following FAD flavoprotelns oxidoreductases have been found [1 ,2J to be evolutionary related. These 
enzymes, which are called 'GMC oxidoreductases', are listed bebw. 

- Glucose oxidase (EC 1.1.3.4) (GOX) from Aspergillus niger. Reaction catalyzed: glucose + oxygen -> della-lu- 
conolactone -i- hydrogen peroxide. 

Methanol oxidase (EC 1.1.3.13) (MOX) from fungi. Reaction catalyzed: methanol + oxygen -> acetaldehyde + 
hydrogen peroxide. 

- Choline dehydrogenase (EC 1.1.99.1) (CHD) from bacteria. Reaction catalyzed: choline + unknown acceptor -> 
betalne acetaldehyde -i- reduced acceptor. 

- Glucose dehydrogenase (GLD) (EC 1.1.99.10) from Drosophila. Reaction catalyzed: glucose + unknown acceptor 
*> delta-gluconolactone -i- reduced acceptor. 

Cholesterol oxidase (CHOD) (EC 1 . 1 .3.6) from Brevibacterium sterolicum and Streptomyces strain SA-COO. Re- 
action catalyzed: cholesterol + oxygen -> cholest-4-en-3-one + hydrogen peroxide. 

- AlkJ [3), an alcohol dehydrogenase from Pseudomonas oleovorans, whk;h converts aliphatic medium-chaih-length 
alcohols into aldehydes. This family also iricludes a lyase: 

- (R)-mandelonitrile lyase (EC 4.1.2.10) (hydroxynilrile lyase) from plants [4], an enzyrrie involved in cyanogenis. 
the release of hydrogen cyanide from injured tissues. These enzymes are proteins of size ranging from 556 (CHD) 
to 664 (MOX) amino acid residues which share a number of regions of sequence similarities. One of these regions, 
located in the N-terminal sectbn. corresponds to the FAD ADP- binding domain. The function of the other consented 
domains is not yet known; two of these domains have been selected as signature patterns. The first one is kx:ated 
in the N-terminal section of these enzymes, about 50 residues after the ADP-binding domain, while the second 
one Is located in the central section. 

- Consensus pattern: (GA]^RKN>x4LIV]-G(2)-[GST](2)-x-[LIVM]-N-x(3)-IFYWA]-x(2)-[PAG)-x( 

- Consensus pattern: [GSHPSTAl-x(2)-[ST|-P-x-[LIVM](2)-x(2)-S-G-{LIVMl-G 

[ 1] Cavener D.R. J. Mol. Biol. 223:811-814(1992). 

[ 2) Henikoff S.. Henikoff J.G. Genomes 19:97-107(1994). 

[ 3) van Beilen J.B., Eggink G.. Enequist H., Bos R., Withott B. Mol. Mk:robiol. 6:3121-3136(1992). 
[ 4) Cheng I.P., Poullon J.E. Plant Cell Physiol. 34:1139-1143(1993). 

[0290] 77. CKS (Cyclin-dependent kinase regulatory subunit) Number of members: 11. Cyclin-dependent kinases 
(CDK) are protein kinases whbh associate with cyclins to regulate eukaryotic cell cycle progressbn. The most well 
known CDK is p34-cdc2 (CDC28 in yeast) which is required for entry into S-phase and mitosis. CDICs bind to a regu- 
latoiy subunit which is essential for their biological function. This regulatory subunit is a small protein of 79 to 150 
residues. In yeast (gene CKS1) and in fission yeast (gene suci) a single isofonm is known, while mammals have two 
highly related isoforms. It has been shown [1] that these CDK regulatory subunits assemble as an hexamer whbh then 
acts as a hub for the oligomerization of six CDK catalytb subunrts. The sequence of CDK regulatory subunits are highly 
consented therefore, the two most conserved regions have been used as signature patterns. 

- Consensus partem: Y-S-x-[KR]-Y-x-[DE](2)-x-[FY]-E-Y-R-H-V-x-lLV]-[PT]-(KRP] 

- Consensus pattern: H-x-P-E-x-H-[IVJ-L-L-F-[KR) 

[0291] [ 1] Parge H E.. Arvai A.S.. Murlari D.J.. Reed S.I.. Tainer J.A. Science 262:387-395(1993). 
[0292] 78. CKJLbeta (Casein kinase II regulatory subunit) 

Number of members: 16. Casein kinase II (CK-2) |1] is an ubk^uitous eukaryotb serine/threonine protein kinase which 
is found both in the cytoplasm and the nucleus and whose substrates are numerous. It generally phosphorylates Ser 
or Thr at the N-terminal of stretch of acidic residues (see <PDOC00006>). CK-2 exists as an heterotetramer composed 
of two catalytb subunits (alpha) and two regulatory subunits (beta). In most species there are two ctosely related 
isoforms of the catalytic subunit: alpha and alpha*. Some species, such as fungi and plants, express two forms of 
regulatory subunits: beta and beta'. The exact function of the regulatory subunit is not yet known. It is a highly consen/ed 
protein of about 25 Kd that contains, in Ks central section, a cystelne-rich motif that could be involved in binding a metal 
such as zinc [2]. This region has been used as a signature pattern. 

- Consensus pattern: C-P-x-ILIVMY]-x-C-x(5)-[Lll-P-(LIVMCl-G-x(9)-V-[KRJ-x(2)-C-P-x-C 



[ 1J Allende J.E., Allende C.C. FASEB J. 9:313-323(1995). 
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{ 2] Reed J.C.. Bidwai A.P.. Gbver C.V.C. J. Biol Chem. 269:18192-18200(1994). 
[02931 79. CLP_proteas9 (CIp protease) 

These proteins betong to family S14 in the classification of peptidases. 



- !- The CIp protease has an active site catalvtie triar* in c 

the catalytic triad. ^ " ^"^ P'°^^- seMll. his-136 and asp-185 form 

.Sw«sP42379co„,a.ns,wo^e«sertions.Swiss:P42380con,ainsonelargein^^^ 
Of two related ATP4,inding 4„Lo5 siilTcgrs^^^^^^ 

twsin-like activity. Its catalytic activity seems o be prt iS^L L 2i«„^rl^^^ " ' ^ 
family of serine proteases, but which evolved bv i^oZl!J2l^ ^ ^ ^ 'he trypsin 

have been found to be encoded in the gero-^o ^TT^^^H"^ toCIpP 
eukaryotes. The sequences arounS l^ J tSr^LI^t!^ ol Pbnts and seem to be also present in other 
hi^^ccse^ed a^^can be .sedas^g^:: s^TIo^^^e^r^^^^^^ ^ ^^"'^^ 

- Consensus pattern: T-x(2)-{LIVMF]-G-x-A-{SACfS-{MSAl-rPAGl-ISTAl f.«5 « .h- = .• « - 

- consensus pattern: R-x(3HEAP^x{3HLIvUUu^LK2^^^^^^^ L'^re^rer 

[IJMedline: 98050920. The structure of CIdP at o q . - 

proteo^sis. Wang J. Hart.^g JA^'S^'gS' v^S ^"^^"'^ ^ ATP^Iapendent 

1) Maurizi M.R Clark W.P. Kim S.-H.. Gottesman S. J. Biol. Chem 265•12546-12552/1QQn^ 
2 Gottesman S.. MauriziM.R Microbiol. Rev. 56:592-621(1992) ^^•^2^12552(1990). 

[ 3J RawBngs N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

SlinfgSS^sa'^!!"' ^''TT"*^'^ '^^^ Nucleotide Gated Channel) 

ZZllT^^X.ZT"^'^'^'"''^ anexpand^gnewfamiVo, ionLnnels. Yau KW; Proc Natl 

r«-S2d^— ^^^^^^^ 

Such a domain is known to exist in the (oltowSg pSeins ^-S^l-stranded. antiparalle. beta-barrel structure. 

- Prokaryotic catabolite gene activator protein (CAP) 

' or-cy^crreois:^^^^ • 

a regulatory chain which contains both c^^esTme ^ '^^^^ and 

the two copies of the domain in their N-tenninal sfirfirTrhl i . ^ '^"^ ^^"^^^ that include 

an amino acid in the conserved rX J bTlTi^r- .1^ '^^^ °' '^'''^ ^^^^ *^ 'o 

cAPK. °' beta-barrel 7. a threonine that is invariant In cGPK is an alanine in most 

openingofthechannelandthLlycaulgaCS^^ 

cAMP-binding. channel plays a role in oi^am^Sr!i^'^°^^'°'*- 

domain, three of whbh J ^cZZS^^^^^fT '"^^"^ ^'"'"^ ^'^'^ '^''^ 

integrity Of the beta-barrel. TvSsignatuTittemf ha^J^^^ tfr"'^' the structural 
Within beta-barrels and 3 and ccJS^ nTtS ^ ^tT^"" P«"«"^ ">^ted 

barre.6a^7andcc«ta.sthethirdreroT:r:rt^^^^^^ 

Consensus pattern: |LIVMh[VICJ-x(2)-G-[DENQTAI.x-IGACI-x(2)-fLIVMFYlf4> xf?» r 
consensus pattern: lL.VMP^G.E-x-,GASHLIVVq-xkl1)-R-lsSCx-^Z^^^ 

I ^1 ' 7 ' ^^^'^ Corbin J.D. Bk)chemistry 28:6122-6127(19891 

( 2J Kaupp U.B. Trends Neurosci. 14:150-157(1991) "1^^(1989). 
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[ 3] Shabb J.B., Corbin J.D. J. Biol. Chem. 267:5723-5726(1992). 

[0296] 81 . COX10_ctaB_cyoE (Cytochrome c oxidase assembly factor) 
11] 

Medline: 95191390 

Biosynthesis and functional role of haem O and haem A 
Mogi T. Saiki Anraku Y; 
Mol Microbiol 1994; 14:391 -398. 

Cytochrome c oxidase is a multi subunit enzyme. The complexity of this enzyme requires assistance in building the 
complex. 

This is carried out by the Cytochrome c oxidase assembly factor. 
Number of members: 31 

[0297] Cytochrome c oxidase is an oligomer ic enzymatic complex which seems to require the aid of a number of 
proteins that either act as chaperonins to help the subunlts of the enzyme to fold correctly, or assist in the assembly 
of the metal centers [1]. One of these subunits is known as COX10 in yeast and as ctaB [2] In aerobic prokaryotes. It 
is evolutionary related to cyoE protein from the Escherichia coli cytochrome O terminal oxkiase complex. 
[0298] These proteins probably contain [3] seven transmembrane segments. The most conserved regbn is kx:ated 
in a toop between the second and third of these segments and has been selected as a signature pattern. 

- Consensus pattern: [EDJ-x-D-x(2)-M-x-R-T-x(2)-R-x(4)-G 

[ 1| Nobrega M.P., Nobrega F,G., Tzagoloff A. 

J. Biol. Chem. 265:14220-14226(1990). 
[ 2] Cao J., Hosier J., Shapleigh J., Revzin A., Ferguson-Miller S. 

J. Biol. Chem. 267:24273-24278(1 992). 
[ 3] Chepuri V., Gennis RB. 

J. Biol. Chem. 265:12978-12986(1990). 

[0299] 82. COX3 (Cytochrome c oxidase subunit 111) 
This family corresporKls to chains c and p. 
[1] 

Medline: 96216288 
The whole structure of the 13-subunit oxidized cytochrome c 
oxidase at 2.8 A. 
Tsuklhara T, Aoyama H. Yamashita E. Tomizaki T. Yamaguchi H, 
Shinzawa-ltoh K, Nakashima R. Yaono R. Yoshikawa S; 
Science 1996;272:1136-1144. 
Number of members: 224 

[0300] 83. COX5B (Cytochrome c oxidase subunit Vb) 
[1] 

Medline: 96216288 

The whole structure of the 13-subunit oxidized cytochrome c oxidase at 2.8 A. 

Tsukihara T, Aoyama H, Yamashita E, Tomizaki T. Yamaguchi H. Shinzawa-ltoh K. Nakashima R, Yaono R. Yoshikawa 

S: 

Science 1996;272:1136-1144. 
This family consists of chains F and S 
Number of members: 10 

[0301] Cytochrome c oxidase (EC 1.9.3.1) [1] is an oligomeric enzymatic complex which is a component of the 
respiratory chain complex and Is involved in the transfer of electrons from cytochrome c to oxygen. In eukaryotes this 
enzyme complex is kxiated in the mitochondrial inner membrane; in aerobic prokaryotes it is found in the plasma 
membrane. In addition to the three large subunits that form the catalytc center of the enzyme complex there are. in 
eukaryotes, a variable number of small polypeptidic subunits. One of these subunits which is known as Vb in mammals, 
V in slime mold and IV in yeast, binds a zinc atom. The sequence of subunit Vb is well consen/ed and includes three 
consented cysteines that are thought to coordinate the zinc ion [2]. Two of these cysteines are clustered in the C- 
terminal section of the subunit; this region has been selected as a signature pattern. 

- Consensus pattern: (LI VM]{2)-IF YWl-x(1 0)-C-x(2)-C-G-x(2)-IFY]-K-L [The two C*s probably bind zinc] 



CO 
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1 1ICapaWiR.A.. MabtestaR. Darley-Usmar V.M. Biochim. Biophys. Acta 726 135-146(1983) 

[ 2J ftezuto a. Sandona D.. Brini M.. Capaldi RA. Bisson R Siim. Biophys. SlTlS-104(l99l). 

[0302] 84. COesterase (Carboxylesterases) 
Cholinesterase pages 

The prints entry is specific to acetylcholinesterase 
Number of members: 273 

esiere (EC 3. 1 .1 .-). Carboxyl-esterases have bean classified into three cateoories R »nH r\ .k- u . 7^ 

nlirrr °' -^9^^^^^^. -me sec^enceSTn^r't^e^^^lyTe?:^^^^^^^ 

II .2.3] that the maprrty are evolutionary related. This family currently consists d tSe foltot^^S 

- /teetylcholinesterase (EC 3.1.1.7) (AChE) [E1J from vertebrates and from Drosophila 

" IS " ^""•^'y' «=^°«"««'erase) (EC 3.1.1.8). Acetylcholinesterase and cholinesterase II am 

closely related enzymes that hydrolyze choUne esters [4J. cnwmesierase li are 

- Mammalian liver microsomal cartioxylesterases (EC 3111) 

■ SL*^"^ ^If' ° ?• ^'^"^ ^ ^'^'^ ejacubtory duct of the male insect reproductive system where it 
plays an important role in its reproductive biology. "uus-uve sysiem wnero it 

Drosophila esterase P. 

Culex pipiens (mosquito) esterases 81 and B2. 

- Myzus persicae (peach-potato aphid) esterases E4 and FE4 

- Mammalian blle-salt-actlvated lipase (BAL) [5], a multifunctional lipase which catalyzes fat and vitamin absomtion 
It IS activated by bile salts in infant intestine where it helps to digest milk fats absorption. 

- Insect juvenile hormone esterase (JH esterase) (EC 3 1 1 59) 

• Lipases (EC 3.1.1.3) from the fungi Geotrichum candidum and Candida rugosa 

- Caenoitiabditis gut esterase (gene ges-1) 

I03CVq So far two bacterial proteins have been found to belong to this family: 

" (Phenylcarbamate hydrolase), an Arthrobacteroxidans plasmid-encoded enzvme taene 

srr.Sg^^'^^"^''^*^"^^^^^^^^^^ 

- Para-nitrobenzyl esterase from Bacillus subtilis (gene pnbA). 

StSLxJ^etteSr^^^^^^^^^ ^"^ '^^'"^ ^•^^^ activity, conta^ a domah evobtionary rented to that 

■ du^veCr"'""" ""^ ^''^'^ ""^^'^ "^"'^'^ "^t-en embryonic celte 

- Drosophila protein glutactin (gene gll), whose function is not known. 

1 1J Myers M.. Richmond R.C.. Oakeshott J,G. Mol. Biol. Evol 5 113-119(1988) 
21KreiciE..DuvalN..ChatonnetA..VincensP..MassoulieJ. Proc Natl Acad Sci USA fiflRK^Ti«sc«,o«.v 
1 31 Cygler M., Schrag ..D.. Sussman ..L. Hare. M.. Silman I. Gentry^M.rDcSir B P^^pSe^if^^^S^ 
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(1993). 

[ 4] Lockrrdge O. BioEssays 9:125-128(1988). 

( 5) Wang C.-S., Hartsuck J.A. Biochlm. Biophys. Acta 1166:1-19(1993). 

5 [0307] 85. CPSase_L_chain (Carbamoyl-phosphate synthase (CPSase)) 
[1] 

Medline: 94347758 

Three-dimensional structure of the biotin cart)oxylase subunit. of acetyl-CoA carboxylase, 
Waldrop GU Raynfient I. HokJen HM; 
»o Biochemistiy 1 994:33:10249-10256. 

[11 

Medline: 90285162 

Mammalian carbamyl phosphate synthetase (CPS). DNA sequence and evolution of the CPS domain of the Syrian 
hamster multifunctional protein CAD. 
'5 Simmer JP. Kelly RE. Rinker AG Jr. Scully JU Evans DR; 
Biol Chem 1 990;265: 1 0395-1 0402. 
Carbamoyl-phosphate synthase catalyzes the ATP-dependent synthesis of carbamyl-phosphate from glutamine or 
ammonia and bicarbonate. This important enzyme initiates both the urea cycle and the biosynthesis of arginine and/ 
or pyrimidines [2]. 

20 The carbamoyl-phosphate synthase (CPS) enzyme in prokaryotes is a heterodimer of a small and large chain. The 
small chain promotes the hydrolysis of glutamine to ammonia, whch is used by the large chain to synthesize carbamoyl 
phosphate. See CPSase_sm_chain. 

The small chain has a GATase domain in the carboxyl terminus. 
See GATase. 
25 N umber of members: 181 

[0308] Carbamoyl-phosphate synthase (CPSase) catalyzes the ATP-dependent synthesis of carbamyl-phosphate 
from glutamine (EC 6.3.5.5) or ammonia (EC 6.3.4.16) and bcarbonate [1]. This important enzyme initiates both the 
urea cycle and the biosynthesis of arginine and pyrimidines. 

[0309] Glutamine-dependent CPSase (CPSase II) is involved in the bkDsynthesis of pyrimkiines and purines. In bac- 
teria such as Escherichia coli, a single enzyme is involved in both biosynthetic pathways while other bacteria have 
separate enzymes. The bacterial enzymes are fomned of two subunits. A small chain (gene carA) that provWes 
glutamine amidolransferase activity (GATase) necessary for removal of the ammonia group from glutamine. and a 
large chain (gene carB) that provides CPSase activity. Such a structure is also present in fungi for arginine biosynthesis 
(genes CPA1 and CPA2). In most eukaryotes. the first three steps of pyrimidine biosynthesis are catalyzed by a large 
multifunctk)nal enzyme - called URA2 in yeast, rudimentary in Drosophila and CAD in mammals [2]. The CPSase 
domain is located between an N-temninal GATase domain and the C-terminal part whk:h encompass the dihydroorotase 
and aspartate transcarbamylase activities. 

[0310] Ammonia-dependent CPSase (CPSase I) is involved in the urea cycle in ureolytic vertebrates; it is a mono- 
f unctfonal protein located in the mitochondrial nnatrix. 

[031 1] The CPSase donnain is typcally 1 20 Kd in size and has arisen from the duplicatbn of an ancestral subdomain 
of about 500 amino ackis. Each subdomain independently binds to ATP and it is suggested that the two homologous 
halves act separately, one to catalyze the phosphorylation of bicarbonate to carboxy phosphate and the other that of 
carbamate to carbamyl phosphate. 

[0312] The CPSase subdonrain is also present in a single copy in the bfotln-dependent enzymes acetyl-CoA car- 
boxylase (EC 6.4.1.2) (ACC), propionyl-CoA carboxylase (EC 6.4.1.3) (PCCase), pyruvate carboxylase (EC 6 4 1 1) 
(PC) and urea carboxylase (EC 6.3.4.6). 

[0313] Two conserved regions which are probably important for binding ATP and/or catalytic activity have been se- 
lected as signatures for the subdomain. 



30 



3S 



40 



4S 



so 



Consensus pattern: lFYVJ4PSHLIVMCl-{LIVMAl-[LIVM]-[KRl.(PSAHSTA]-x(3)-|SG]-G-x-IAGl 
Consensus pattern: tl-tVMFHLIMN]-E-[LIVMCA]-N-[PATLIVMHKR]-[LIVMSTAC] 



[ 1) Simmer J. P. Kelly R.E., Rinker A.G. Jr., Scully J.L, Evans D.R. 
J. Biol. Chem. 265:10395-10402(1990). 
ss [ 2] Davidson J.N.. Chen K.C., Jamison R.S., Musmanno LA, Kern C.B. 

BioEssays 15:157-164(1993). 

[0314] 86. CPSase_sm_chain (Carbamoyl-phosphate synthase small chain, CPSase domain) 
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Medline: 902B51 62 

Mammalian carbamyl phosphate synthetase (CPS). DNA sequence and evolution of the CPS domain of the Syrian 
hamster multifunctional protein CAD. 

Simmer JR Kelly RE, Rinker AG Jr. Scully JL, Evans DR; 
Biol Chem 1990;265:10395-10402. 

The caft)anrK>yI-phosphate synthase domain is In the amino terminus of protein. 
Carbamoyl-phosphate synthase catalyzes the ATP^ependent synthesis of cart>amyl-phosphate from glutamine or 
ammonia and bicarbonate. This important enzyme initiates both the urea cycle and the biosynthesis of arginine and^ 
or pyrimidines[1]. 

The carbamoyl-phosphate synthase (CPS) enzyme in prokaryotes is a heterodimer of a smaH and large chain The 
small chain promotes the hydrolysis of glutamine to ammonia. whk:h is used by the large chain to synthesize carbamoyl 
phosphate. See CPSase_L_cha&i. 

The small chain has a GATase domain in the carboxyl terminus. 

See GATase. 

Number of members: 46 

[0315] Caitamoyl-phosphate synthase (CPSase) catalyzes the ATP-dependent synthesis of carbamyl-phosphate 
from glutamine (EC 6.3.5.5) or ammonia (EC 6.3.4.16) and bx:arbonate [1 J. This important enzyme initiates both the 
urea cycle and the biosynthesis of arginine and pyrimidines. 

[0316] Glutamine-dependent CPSase (CPSase II) is involved in the biosynthesis of pyrimidines and purines In bac- 
tera such as Escherichia coli. a single enzyme is involved in both biosynthetic pathways while other bacteria have 
separate enzymes. The bacterial enzymes are fonned of two subunits. A smaU chain (gene carA) that provides 
glutamine amidotransferase activity (GATase) necessary for removal of the ammonia group from glutamine and a 
large chain (gene carB) that provides CPSase activity. Such a structure is also present in fungi for arginine biosyiithesrs 
(genes CPA1 and CPA2). In most eukaryotes, the first three steps of pyrimidine biosynthesis are cata^zed by a large 
multifunctional enzyme - called URA2 in yeast, rudimentary in Drosophila and CAD in mammals [2] The CPSase 
domam is tocated between an N-temr)inal GATase domain and the C-tenminal part which encompass the dihydroorotase 
and aspartate transcarbamylase activities. 

[0317] Ammonla^ependent CPSase (CPSase I) Is involved in the urea cycle in ureolytic vertebrates* it is a mono- 
functional protein located in the mitochondrial matrix. 

[0318] The CPSase domain is typfcally 120 Kd in size and has arisen from the duplication of an ancestral subdomain 
of about 500 ammo acids. Each subdomain independently binds to ATP and it is suggested that the two homologous 
halves act separately, one to catalyze the phosphorylation of bicarbonate to carboxy phosphate and the other that of 
carbamate to carbamyl phosphate. 

[0319] The CPSase subdomain is also present in a single copy in the biotin^iependent enzymes acetyl-CoA car- 
boxylase (EC 6.4.1.2) (ACC). propionyl-CoA cartx>xylase (EC 6.4,1.3) (PCCase), pyruvate carboxylase (EC 6 4 1 1) 
(PC) and urea carboxylase (EC 6.3.4.6). • / 

[0320] Two conserved regions which are probably important for binding ATP and/or catalytic activity have been se- 
lected as signatures for the subdomain. 

- Consensus pattern: [FYVHPSJ- lLIVMC]-[LIVMAI-[UVMHKR]-IPSA]-lSTAl-x(3)-[SG)-G-x-IAGl 

- Consensus pattern: (LIVMFHLIMN]-E-[LI VMCA]-N-[PATLIVMh[KRHLIVMSTAC] 

[ 1 J Simmer J.P, Kelly RE.. Rinker AG. Jr.. Scully J.L.. Evans D.R. J. Biol. Chem. 265:10395-10402(1990) 
[ 21 Davidson J.N.. Chen K.C.. Jamison R.S.. Musmanno LA.. Kern C.B. BioEssays 15:157-164(1993). 

[0321] 87. CARL.TRIO (CRAL/TRIO domain) 

in 

Medline: 98121119 

Crystal structure of the Saccharomyces cerevisiae phosphatidyl inositol-transfer protein 

Sha B. Phillips SE, Bankaitis VA. Luo M; 
Nature 1998;391:506-510. 

The original profile has been extended to include the carboxyl domain from the known stojcture of Seel 4 Swiss 
PI 0911 has not been bicluded in the Ram family because it does not appear to contain a complete structural domain 
Number of members: 39 

[0322] 88. CSD ('Cold-shock'DNA-binding domain) 
(IJ 

Medline: 94255482 
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Crystal structure of CspA, the major cold shock proteir) of Escherichia coll. 
Schindelir) H, Jiang W, Inouye M, Heinemann U; 
Proc Natl Acad Sci U S A 1994;91:5119-5123. 
Number of members: 121 

[0323] A conserved domain of about 70 ammo acids has been found in prokaryotic and eukaryotc DNA-bindIng 
proteins [1 ,2.3. El]. This domain, whch is known as the 'cold-shock doniain'(CSD) is present in the proteins listed below. 

Escherichia coll protein CS7.4 (gene cspA) which is induced in response to low temperature (coki-shock protein) 
and which binds to and stimulates the transcriptbn of the CCAAT-containing promoters of the HN-S protein and 
of gyrA. 

- Mammalian Y box binding protein 1 (YB1 ). A protein that binds to the CCAAT-containing Y box of mammalian HLA 
class II genes. 

- Xenopus Y box binding proteins -1 and -2 (Y1 and Y2). Proteins that bind to the CCAAT-containing Y box of 
Xenopus hsp70 genes. 

- Xenopus B box binding protein ( YB3). YB3 binds the B box promoter element of genes transcribed by RN A polymer- 
ase 111. 

- Enhancer factor I subunit A (EFI-A) (dbpB). A protein that also bind to CCAAT-motIf in vark>us gene promoters. 
DbpA, a Human ONA-binding protein of unkrK>wn specifrcity. 

Bacillus subtilis cold-shock proteins cspB and cspC. 

Streptomyces ctavuligerus protein SC 7.0. 

Escherichia coli proteins cspB. cspC, cspD. cspE and cspF. 

Unr, a mammalian gene encoded upstream of the N-ras gene. Unr contains nine repeats that are simSar to the 
CSD domain. The function of Unr is not yet known but it couki be a multivalent DNA-bindIng protein. 

[0324] As a signature pattern for the CSD domain, its most conserved region which is located in its N-terminal section 
has been selected. It must be noted that the 

beginning of this region is highly similar [4] to the RNP-1 RNA-binding motif. 

- Consensus pattern: [FYJ-G-F-l-x(6,7)-[DERJ-(UVM]-F-x-H-x-[STKR)-x-[LIVMFY] 

1 1) Doniger J., Landsman D., Gonda M.A., Wistow G. 

New Biol. 4:389-395(1992). 
[ 2] Wistow G. 

Nature 344:823-824(1990). 
[ 3] Jones RG.. lrK>uye M. 

Mol. Microbiol. 11:811-818(1994). 
[ 4] Landsman D. 

Nucleic Acids Res. 20:2861-2864(1992). 

[0325] 89. CTF.NFI (CTF/NF-I family) 
Number of members: 45 

[0326] Nuclear factor I (NF-I) or CCAAT box-binding transcription factor (CTF) (1 ,2] (also known as TGGCA-binding 
proteins) are a family of vertebrate nuclear proteins which recognize and bind, as dimers, the palindromic DNA se- 
quence 5'-TGGCANNNTGCCA-3'. CTF/NF-I binding sites are present in viral and cellular promoters and in the origin 
of DNA replication ol Adenovirus type 2. 

[0327] The CTF/NF-I proteins were first kfentified as nuclear factor I. a collection of proteins that activate the repli- 
catfon of several Adenovims serotypes (together with NF-II and NF-III) [3]. The family of proteins was also identified 
as the CTF transcr^tion factors, before the NFI and CTF families were found to be identical [4]. The CTF/NF-I proteins 
are indivkjualty capable of activating transcription and DNA replication. The CTF/NF-I family name has also been 
dubbed as NFI. NF-I or NFI. 

[0328] In a given species, there are a large number ol different CTF/NF-I proteins. The multiplicity of CTF/NF-I is 
known to be generated both by alternative splrcing and by the occurrence of four different genes. The known forms of 
NF-I genes have been classified as: 

The CTF-like factors subfamily (prototype form: CTF-1) (4J 
The NFI-X proteins. 
The NFI -A proteins. 
The NFI-B proteins. 
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^ ?^^P , "^^'^ '° transcription and replication activities 

OMOJ CTF^F-1 proteir^s contains 400 to 600 amino acids. The N4ermnal 200 amincvacid sequence almost oer 
lectly consented u, all species and genes sequenced, mediates site^pecific ONA recognition pST;S^ f„H 
Adenoviois DNA repl«a,K«. The C-temiinal 100 amino acids containlhe transcripS^S dSn 

SedXl^irgTt^ferrf^Lt^^^^^^^^ 
- Consensus pattern: R-K-R-K-Y-F-K-K-H-E-K-R 

1 1J Mermod N.. aNeHI EA, Kelly T.J., Tjian R. 
Cell 58:741-753(1989). 

1 2] Rupp RAW.. Kruse U.. Multhaup G., Goebel U.. Beyreuther K 
Sippel A.E. 

Nucleic Acids Res. 18:2607-2616(1990). 
[ 3] Nagata K.. Guggenheimer R.A.. Enomoto T, Lichy J.H.. Hurwitz J 

Proc. Natl. Acad. Sci. U.S.A. 79:6438-6442(1982). 
[ 4] Santoro C, Mermod N., Andrews PC. Tjian R 

Nature 334:2118-2224(1988). 
[ 5J Gif a. Smith J.R.. Goldstein J.L. Slaughter C. A., Orth K.. Brown M.S.. Osborne T F 

Proc. Natl. Acad Sci. U.S.A 85:8963-8967(1988). 

[ 6] Alevizopoulos A., Dusserre Y. Tsai-Pflugfelder M.. von der Weid T.. Wahli W , Memiod N 
Genes Dev. 9:3051 -3066(1 995). fviemrKxi n. 

[0332] 90. Calsequestrin (Calsequestrin) 
Number of members: 1 3 

- Consensus pattern: (EQ]-[DEJ-G-L-[DN]-F-P^x-Y-D-G-x-D-R-V 

- Consensus pattern: (DEhL.E-D-W.[LIVMJ.E-D-V-L-x-G-x-[LIVM]-N-T.E-D-D-D 

[0335] [ 1J Treves S.. Vilsen B.. Chiozzi R, Andersen J.P. Zorzato F. 

Biochem. J. 283:767-772(1992). 
[0336] 91 . Carboxyljrans (Carboxyl transferase domain) 

Medline: 93374821 

Thornton CG, Kumar GK. Haase FC. Phillips NF, Woo SB Park VM 

Magner WJ. Shenoy BC. Wood H6. Samols D: J Bacteriol 1993;175:5301-5308. 

Medline: 93358891 

Molecular evolution of biotln-dependent carboxylases. 
Toh H, Kondo H. Tanabe T; 
Eur J Biochem 1993;215:687-696. 
All of the members in this family are biotin dependent carboxylases 
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All of the members in this family utilise acyl-CoA as the acceptor molecule. 
Number of members: 47 

[0337] 92. ChaLstiLsynt (Chalcone and stilbene synthases) 
N umber of members: 1 46 

s [0338] Chalcone synthases (CHS) (EC 2.3. 1 .74) and stilbene synthases (STS) (formerly known as resveratrol syn- 
thases) are related plant enzymes {IJ. CHS is an important enzyme in flavanoid biosynthesis and STS a key enzyme 
in stilbene-type phyloalexin biosynthesis. Both enzymes catalyze the addition of three molecules of mak>nyt-CoA to a 
starter CoA ester (a typical example is 4-coumaroyl-CoA). producing either a chalcone (with CHS) or stilbene (with 
STS). 

10 [0339] These enzymes are proteins of about 390 amino-acid residues. A conserved cysteine residue, located in the 

central section of these proteins, has been shown [2] to be essential for the catalytic activity of both enzymes and : > 

probably represents the binding site for the 4-coumaryl-CoA group. The region around this active site residue is well 
conserved and can be used as a signature pattern. 

[0340] In addition to the plant enzymes, this family also includes Bacillus subtilis bcsA. 

IS 

- Consensus pattern: R-lLIVMFYS]-x-[LIVM]-x-[QHGl-x-G-C-(FYNAJ-[GA]-G-[GA]-[STAV]-x-(LIVMFl-{RA] {C is the 

active site residue) i;-* 

[ 1] Schroeder J., Schroeder G. 

20 z, Naturf orsch. 45C: 1 ^(1 990). T 

[ 2) Lanz T., Tropf S., Mamer F.-J.» Schroeder J., Schroeder G. 
J. Biol. Chem, 266:9971-9976(1991). 

[0341] 93. Chorismate_synt (Chorismate synthase) 

2S Number of members: 19 

[0342] Chorismate synthase (EC 4.6.1.4) catalyzes the last of the seven steps in the shikimate pathway which is 
used in prokaryotes, fungi and plants for the biosynthesis of aromatb amino acids. It catalyzes the 1 ,4-trans eliminatbn 
of the phosphate group from 5-enolpyruvylshrkimate-3-phosphate (EPSP) to form chorismate which can then be used 
in phenylalanine, tyrosine or tryptophan biosynthesis. Chorismate synthase requires the presence of a reduced flavin 

30 mononucleotide (FMNH2 or FADH2) for its activity. 

[0343] Chorismate synthase from various sources shows [1 .2) a high degree of sequence consen^ation. It is a protein 
of about 360 to 400 amino-acid residues. Three signature patterns have been developed from conserved regions rich 
in basic residues (mostly arginines). The first is in the N-terminal sectbn. the second is central and the third is C-terminal. 

35 - Consensus pattern: G-E-S-H-[GC]-x(2)-[LIVM]-[GTV]-x-[LIVM](2)-(DE]-G-x-[PVl 

- Consensus pattern: (GE]-R-[SA](2)-[SAG]-R-[EVJ-[ST]-x(2)-[RH]-V-x(2)-G 

- Consensus pattern: R-{SHJ-D-IPSVHCSAV]-x(4)-IGAl]-x-[IVGSPHLlVM]-x-E-(STAHl-[LIVM] 

1 1 J Schaller A.. SchmkJ J.. Leibinger U„ Amrhein N. 
40 J. Biol. Chem. 266:21434-21 438(1 991 ). 

1 2] Jones D.G.L. Reusser U., Braus G.H. 
Mol. Microbiol. 5:2143-2152(1991). 

[0344] 94. Clat_adaptor_s (Ctathrin adaptor complex small chain) 

45 Number of members: 21 

[0345] Clathrin coated vesicles (CC V) mediate intracellular membrane traffic such as receptor mediated endocytosis. 
In addition to clathrin, the CCV are composed of a number of other components including oligomeric complexes whk:h 
are known as adaptor or clathrin assembly proteins (AP) complexes [1 J. The adaptor complexes are believed to interact 
with the cytoplasmic tails of membrane proteins, leading to their selectbn and concentraton. In mammals two type of 

so adaptor complexes are known: AP-1 which is associated with the Golgi complex and AP-2 which is associated with 
the plasma membrane. Both AP-1 and AP-2 are heterotetramers that consist of two large chains - the adaptins - 
(gamma and beta* in AP-1 ; alpha and beta in AP-2); a medium chain (AP47 in AP-1; AP50 in AP-2) and a small chain 
(API 9 in AP-1; API 7 in AP-2). 

[0346] The small chains of AP-1 and AP-2 are evolutionary related proteins of about 18 Kd. Homologs of API 7 and 
55 AP19 have also been found in yeast (genes APSWAPig and APS2/YAP17) (2.3,4). AP17 and AP19 are also related 
to the zeta-chain [5] of coatomer (zeta-cop). a cytosolic protein complex that reversibly associates with Golgi mem- 
branes to form vesicles that mediate biosynthetic protein transport from the endoplasmic reticulum, via the Golgi up 
to the trans Golgi network. 
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103471 A consented region in the central section of these proteins has been selected as a signature pattern. 

- Consensus pattern: fUVMJ(2)-Y-lKRJ.x(4)-L-Y-F 

{ IJ Pearse B.M.. Robinson M.S. 

Annu. Rev. Cell Biol. 6:151-171(1990). 

[ 2J KIrchhausen T. Davis A.C.. Fnicht S., CBrine Greco B.. Payne G S 
TubbB. 

J. BioL Chem. 266:11153-11157(1991). 
1 3) Nakai M.. Takada T, Endo T. 

Blochim. Biophys. Acta 1174:282-284(1993). 
[ 4J Phan H.L. Finby J.A.. Chu D.S.. Tan KIrchhausen T, Payne G S 

EMBO J, 13:1706.1717(1994). 

[ 5] Kuge O.. Hara-Kuge S.. Orci L. Ravazzola M.. Amherdt M.. Tanigawa G 
Wieland FT. . Rothman J. E. ^ 
J. Cell Biol. 123:1727-1734(1993). 

[0348] 95. ClathrinJgLCh (Clathrin light chain.) 
Number of members: 8 

[0349] Clathrin |1 .2] is the major coat-forming protein that encloses vesicles such as coated oits and inrm. .^n 

- In higher eukaryotes two genes code for distinct but related light chains- LC(a) and LC<W Eanh «f .h« 
can y.e«. by tissue^ecific alternative splicing, two sepa.te forn« wh^Sir^tSStertS 21 
respectrvely thirty or eighteen residues. There is. in the N4enninal part ol the cb^hrir^ Hnht^f ^ ^ 
twenty one an,ino acid residues which is perfect!; oonserJ^^LClaT^Lcl) ' ^ ^^""^ °' 

' JlJSr" " ' """" """^ ^"^^ '''■'''^ ''^^^^ '«'ated to that o. higher eu- 

[0351] Two signature patterns have been developed for clathrin fioht chains Th« lirct r.o««^ i u 

■ Consensus pattern: F-L-A-Q-Q-E-S 

[IJKeen J.K 

Annu. Rev. Biochem. 59:415-438(1990) . 
[ 2J Brodsky FM. 

Science 242:1396-1402(1988). 
1 3J Brodsky FM., Hill B.L, Acton S.L, Naethke I., Wong D H 
Ponnambalam S., Parham P. 
Trends Biochem. Sci. 16:208-213(1991). 

[0352] 96. (Clathrin repeat) 7-lold repeat in Clathrin and VPS 

it::^:^^^^^^^ \t ^"'"^ ^^^^^^ ^^^^^^^ ^-vy cha^. 

Medline: 92191269 

Folding and trimerization of clathrin subunits at the triskelion hub 
Nathke IS. Heuser J. Lupas A. Stock J, Turck CW. Brodsky FM- 
CelM 992:68:899-910. [2] 
Medline: 88097376 

Clathrin heavy chain: molecular cloning and complete primary structure 
Kirchhausen T. Harrison SC. Chow ER Mattaliano RJ. 
Ramachandran KL. Smart J, Brosius J; 
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Prcx: Natl Acad Sci U S A 1987;84:e605-8809. 
[0353] 97. Collagen (Collagen triple helix repeat (20 copies)) 
[1] Medline: 94059583 
New members of the collagen superfamily 
Mayne R. Brewton RG; 
Curr Opin Cell Biol 1993:5:883-890. 
Scun/y Is associated with collagens. 

Members of this family belong to the collagen superfamily (1J. 

Collagens are generally extracellular structural proteins involved in formation of connective tissue structure. 
The alignment contains 20 copies of the G-X-Y repeat that forms a triple helix. The first position of the repeat is glycine, 
the second and third positions can be any residue but are frequently proline and hydroxyproline. Collagens are post 
translationally modified by proline hydoxy lase to fonm the hydroxyproline residues. Defective hydroxylation is the cause 
of scurvy. 

Some members of the collagen superfamily are not involved in connective tissue structure but share the same triple 
IS helical structure. 

Number of members: 2125 

[0354] 98. Coprogen.oxidas (Coproporphyr'mogen III oxidase) 
Number of members: 12 

Coproporphyrinogen 111 oxidase (EC 1.33.3) (coproporphyrinogenase) [1,2] catalyzes the oxidative decarboxylation 
of coproporphyrinogen III into protoporphyrinogen IX, a common step in the pathway for the biosynthesis of porphyrins 
such as heme, chlorophyll or cobalamin. 

[0355] Coproporphyrinogen III oxidase is an enzyme that requires Iron for its activity. A cysteine seems to be important 
for the catalytic mechanism [3). Sequences from a variety of eukaryotic and prokaryotic sources show that this enzyme 
has been evolutlonarily conserved. A highly conserved region in the central part of the sequence has been selected 
as a signature pattern. This region contains the only conserved cysteine and is rich in charged amino acids. 

- Consensus pattern: K-x-W-C-x(2)-(FYH](3)-[UVMJ-x-H-R-x-E-x-R-G-[LIVMJ-G-G-[LIVM)-F-F-D 

(IJXuK.. EllinttT. 
30 J. Baclerbl. 175:4990-4999(1993). 

[ 2] Kohno H., Furukawa T, Yoshinaga T, Tokunaga R.. Taketani S. 

J. Biol. Chem. 268:21359-21363(1993). 
[ 3] Camadro J.M., Chambon H., Jolles J., Labbe P. 
Eur. J. Biochem. 156:579-587(1986). 
35 [4]XuK.. ElfottT. 

J. Bacteriol. 176:3196-3203(1994). 

[0356] 99. Corona_nucleoca (Coronavirus nucleocapsid protein) 
[11 

40 Medline: 98087828 

Identificatktn of a specific interactkxi between the 
coronavirus mouse hepatitis virus A59 nucteocapski protein 
and packaging signal. 
Molenkamp R. Spaan WJ; 
45 Virology 1997;239:78-86. 

Number of members: 44 
[0357] 100. Cu-oxidase (Multicopper oxkiase) 
[IJ 

Medline: 90126844 
50 The blue oxidases, ascorbate oxidase, laccase and ceruloplasmin. 
Modelling and structural relationships. 
Messerschmidt A. Huber R; 
Eur J Biochem 1990; 187:341 -352. 
Number of members: 150 

ss [0358] Multicopper oxidases [1 ,2] are enzymes that possess three spectroscopically different copper centers. These 
centers are called: type 1 (or blue), type 2 (or normal) a.nd type 3 (or coupled binuclear). The enzymes that belong to 
this family are: 



C7 
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- Ascorbate oxidase (EC 1 .10.3.3), a higher plant enzyme 

' '•'•^•^•'^ (terroxidase). a protein found in the serum of mammals and birds, which oxidizes a 

9reatvanetyo^.norganKandorganicsubstances.StructuralVcen,loplasmin«^^^^ 

SSLSte Si^" """""""^ °' ^ --^r to that fZTl^^^S 

S J"<T"k"?" "'^^^ ^« a n"n*»f C P'oteins whicN on the basis ol sequence similarities 

can be said to belong to this family. These proteins are: «»MUBnce simiianiies. 

" ' f^fetance protein A (copA) from a plasmid in Pseudomonas syringae. This protein seems to be invohred 

in the resistance of the microbial host to copper looeinvowea 
Blood coagulation factor V (Fa V). 

- Blood coagulation factor VIII (Fa VIII) [E1J. 

Yeast FET3 [3], which Is required for ferrous iron uptake 

- Yeast hypothetical piotein YFL041 w and SpAC1F7.08. the fission yeast homolog. 

of a tnpficated A doma.,. a B domain and a duplicated C domain; in the folbwft^g o«terU-rA^C -^-T^f 
domainis related to the multicopper oxidases. « '«»wng oraer A A-B-A-C-C. The A-type 

S JZl ^^'"'^ ^^^""^ ^ developed for these protehs. Both patterns are derived from the same 
ZT,L w T^J^ o* ceruloplasmin^nd in copA. ci^Snsre ^esZs 

presence of copper-bmding residues and thus can detect domains that have lost the abHHv i« hL^^l i ^ 
those in Fa V and Fa Vlll). while the second pattem is specKc to copSrSndinTdl^ ^ ' 

- Consensus pattern: G-x-[FYWl-x-[UVMFYWJ-x4CSThx(8)-G4UWIl-x(3)4UV»/IFYW| 

" ~r.':„"S3;'S',-'S£:r »~ - »• •«» 3 1™. c 

[0362] 101. Cullin(Cullrn family) 
Number of members: 24 

[0363] The following proteins are collectively termed cullins 1 1 J: 

- Fissbn yeast hypothetical protein SpAC24H6.03. 

- Consensus pattem: l^^^^^'y^{2)mx(2yL,^^^^^ 

1 1) Kipreos E.T.. Lander LE., Wing J.R, He W.W.. Hedgecock E M 
Cell 85:829-839(1996). 

1 2] Bumatowska-Hledin M.A.. Spielman W.S.. Smith W.L.. Shi P Meyer J M 
Dewitt D.L. ^ ' 

Am. J. Physiol. 268:f1198-Fl 210(1 995). 

[ 3J Mathias N. Johnson S.L, WIney M.. Adams A.E.. Goetsch L. Pringle J R 
Byers B., Goebl M.G. ^ ' 
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Mol. Cell. Biol. 16.-6634-6643(1996). 

[03651 102. {Cu_amine_oxid) 
Copper amine oxidase signatures 

Amine oxidases (AO) [IJ are enzymes that catalyze the oxidation of a wide range of biogenic amines including many 
neurotransmitters, histamine and xenobiotic amines. There are two classes of amine oxidases: flavin^ontaining (EC 
1.4.3.4) and copper-containing (EC 1.4.3.6). 

[0366] Copper-containing AO is found in bacteria, fungi, plants and animals, it is an hooKxlimeric enzyme that binds 
one copper ion per subunit as well as a 2,4.5- trftiydroxyphenylalanine quinone (or topaquinone) (TPQ) cofactor. This 
cofactor is derived from a tyrosine residue. 

[0367] Two signature patterns were derived for copper AO. the first one contains the tyrosine which give rises to the 
TPQ cofactor while the second one contains one of the three histidines that bind the copper atom [2J. 
[0368] Consensus pattem[LI VMHLI VMA]-[LIVMF]-x(4).[ST].x{2)-N-Y-[DEHYNJ [The first Y gives rises to TPQ] Se- 
quences known to belong to this class detected by the patternALL 

[0369] Consensus pattern T-x-[GSJ-x(2)-H-(LIVMF)-x(3)-E-lDEl-x-P [H Is a copper ligand] Sequences known to be- 
long to this class detected by the pattem ALL, except for lentil AO. 

[ 1] Knowles PR. Dooley D.M. (In) Metal ions in biological systems; Sigel H., Sigel A., Eds., 30:361- 403. Marcel 
Dekker. New- York, (1993). 

1 2] Parsons M.R., Convery M.A.. Wilmot CM., Yadav K.D.S.. Blakeley V., Comer A.S., Phillips S.E.V., McPherson 
M.J., Knowles P.P. Structure 3:1171-1184(1995). 

[0370] 103. Cys-protease (Cysteine protease) 
Number of members: 358 

[0371] Eukaryotic thiol proteases (EC 3.4.22.-) [1] are a family of proteolytic enzymes whch contain an active site 
cysteine. Catalysis proceeds through a thioester intermediate and is facilitated by a nearby histidine side chain; an 
asparagine completes the essential catalytic triad. The proteases which are currently known to betong to this family 
are listed below (references are only provided for recently detenmined sequences). 

- Vertebrate lysosomal cathepsins B (EC 3.4.22.1), H (EC 3.4.22.16), L (EC 3.4.22.15). and S (EC 3.4.22.27) [2]. 

- Vertebrate lysosomal dipeptkiyi peptidase I (EC 3.4. 1 4. 1 ) (also known as calhepsin C) [2]. 

- Vertebrate calpains (EC 3.4.22. 1 7). Calpains are intracellular calcium-activated thkrf protease that contain both a 
N-temiinal catalytic domain and a C-tenminal calcium-binding domain. 

Mammalian cathepsin K, which seems involved in osteoclastic bone resorptk^n [3]. 
Human cathepsin O [4]. 

Bleomycin hydrolase. An enzyme that catalyzes the inactivatlon of the antitumor drug BLM (a glycopeptide). 

- Plant enzymes: barley aleurain (EC 3.4.22. 16). EP-B1/B4; kidney bean EP-C1 . rice bean SH-EP; kiwi fruit actinkJin 
(EC 3.4.22.14); papaya latex papain (EC 3.4.22.2). chymopapain (EC 3.4.22.6), carcain (EC 3.4.22.30), and pro- 
teinase IV (EC 3.4.22.25); pea turgor-responsive protein 15A; pineapple stem bromelain (EC 3.4.22.32); rape 
COT44; rice oryzain alpha, beta, and gamnrva; tomato tow-temperature induced, Arabidopsis thaliana A494 RD1 9A 
andRD21A. 

House<lust mites allergens DerPI and EurMl. 

Cathepsin B-like proteinases from the womns Caenorhabditis elegans (genes gcp-1, cpr-3, cpr-4. cpr-5 and cpr- 
6). Schistosoma mansoni (antigen SM31 ) and Japonica (antigen SJ31 ), Haemonchus contortus (genes AC-1 and 
AC-2), and Ostertagia ostertagi (CP-I and CP-3). 
Slime mold cysteine proteinases CP1 and CP2. 
Cruzipain from Trypanosoma cruzi and brucei. 

Throphozoite cysteine proteinase (TCP) from various Plasmodium species. 
Proteases from Leishmania mexcana. Theileria annulata and Theileria parva. 
Baculoviruses cathepsin-like enzyme (v-cath). 

Drosophila small optic lobes protein (gene sol), a neuronal protein that contains a calpain-like domain 

- Yeast thtol protease BLH1 /YCPl /LAP3. 

Caenorhabditis elegans hypothetical protein C06G4.2. a calpain-like protein. 

[0372] Two bacterial peptidases are also part of this family: 

Aminopeplidase C from Lactococcus lactis (gene pepC) [5]. 
Thiol protease tpr from Porphyromonas gingivals. 



EP 1033 405 A2 



IP373] Three other proteins are structurally related to this fanrrily. but may have lost their proteolytic activity. 

- Soybear, oil body protein P34. This protein has its active site cysteine replaced by a qlvcine 

<P^i^ ^ ""'""^ "^"^ ^ « UKWoa«in protein (see 

- Plasmodium falciparum serlne-repeat protein (SERA), the major blood stage antigen This protein of ill Kd 
sesses a C-term.nal thiol-protease-like domain [6]. but the active site cysteine is re^ace^by^^n^e 

t03Ml The sequences around the three active site residues are well consenred and can be used as signature pat- 

- Consensus pattern: 0-x(3)-[GE]-x-C-[YW]-x(2HSTAGCl-ISTAGCV) [C is the active site residue) 

. consensus panem: [UN^GSTAN,-x-H-,GSACEHUV«^-x4UVl4(2)-G.^^^^ 3C«ve site resi- 

• s;s;^;?isthTSesre^s!;r^^^ 

[ 1] Dufour E. 

Biochimie 70: 1 335-1 342(1 988). 
( 2] Kirschke H., Barrett A.J., Rawlings N.D. 

Protein Prof. 2:1587-1643(1995). 
1 3] Shi G.-P., Chapman H.A.. Bhairl S.M., Deleeuw C. Reddy V Y Weiss S J 

FEBS Lett. 357:129-134(1995). ^ 
1 4] Velasco G.. Ferrando A.A., Puente XS.. Sanchez LM., LopezOtin C 

J. Biol, Chem. 269:271 36-271 42(1 994). 
[ 5] Chapot-Chartier M.P.. Nardi M.. Chopin M.C., Chopin A.. Gripon J C 

Appl. Environ. Microbiol. 59:330-333(1993). 
[ 6) Higgins D.G.. McConnell D. J.. Sharp P.M. 

Nature 340:604-604(1989). 
1 7] Rawlings N.D., Barrett A.J. 

Meth. Enzymol. 244:461-486(1994). 

J Mol Biol 1996;262;202-224. 
[1] Medline; 99059720 

Coastal structure of Escherichia coli cystathionine gamma-synthase at 1.5 A resolution 
Clausen T. Huber R. Prade L. W^l MC. Messerschmidt A: 
EMBO J 1998;17:6827-6838. 
Database Reference: SCOP; Icsl; fa; [SCOP-USAJfCATH-PDBSUM] 

Th« family includes enzymes involved in cysteine and methionine metabolism. The following are members: 

Cystathionine gamma-lyase, 
Cystathionine ganrvna-synthase. 
Cystathionine beta-lyase, 
Methionine gamma-lydse. 
OAH/OAS sulfhydrylase, 
O-succinylhomoserine sulphhydrylase 

All of these members participate is slightly different reactions 

AU these enzymes use PLP (pyridoxal-5'-phosphate) as a cofactor 

Number of members 52 
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Cystathionine gamma-lyase (EC 4.4.1 .1 ) (gamma-cystathionase), which catalyzes the transformation oi cystath- 
ionine into cysteine, oxobutanoate and annmonla. This is the final reaction in thetransulfuration pathway that leads 
from methionine to cysteine in eukaryotes. 

- Cystathionine gamma-synthase (EC 4.2.99,9). which catalyzes the conversion of cysteine and succinyi-homoser- 
ine into cystathionine and succinate: the first step in the biosynthesis of methionine from cysteine in bacteria (gene 
metB). 

Cystathionine beta-lyase (EC 4.4.1 .8) (beta-cystathionase), which catalyzes the conversion ol cystathionine into 
homocysteine, pyruvate and ammonia: the second step in the biosynthesis of methionine from cysteine in bacteria 
(gene metC). 

Methionine gamma-lyase (EC 4.4.1.11) (L-methionlnase) which catalyzes the transformation of methionine into 
methanethiol, oxobutanoate and ammonia. 

OAH/OAS sulfhydrytase, which catalyzes the conversion of acetyihomoserine into homocysteine and that of ace- 
tylserine into cysteine (gene MET17 or MET25 in yeast). 
O-succinylhomoserine sulfhydrytase (EC 4.2.99.-). 
Yeast hypothetical protein YGL1B4c. 

- Yeast hypothetical protein YHR1 1 2c. 

[0377] These enzymes are proteins of about 400 amino-acid residues. The pyridoxal-P group is attached to a lysine 
residue located in the central section of these enzymes; the sequeru;e around this residue is highly conserved arKi can 
be used as a sigriature pattern to detect this class of enzymes. 

- Consensus pattern: [DQ]-[LIVMF]-x(3)-[STAGC]-[STAGCipT-K-[FYWQl-[LIVMFJ-x-G-IHQ]-[SGNH] [K is the pyri- 
doxal-P attachment site) 

[ 1 ] Ono B.I., Tanaka K., Naito K.. Heike C. Shinoda S., Yamamoto S., Ohmori S., Oshlnna T. Toh-E A. J. Bacteriol. 
174:3339-3347(1992). 

[ 2] Barton A.B.» Kaback D.B.» Clark M.W.. Keng T, Ouellette B.F.F., Storms R.K., Zeng B.. Zhong W.W., Fortin 
N.. Delaney S.. Bussey H. Yeast 9:363-369(1993). 

[0378] 105. Cyt_reductase 
FAD/NAD-binding Cytochrome reductase 
Number of members: 60 
[1] Medline: 95111952 

Crystal structure of the FAD-containing fragment of com nitrate reductase at 2.5 A resolution: relationship to other 
flavoprotein reductases. 
Lu G, Campbell WH, Schneider G. Lindqvist Y; 
Structure 1994;2:809-821. 
[2] Medline: 92084635 

The sequence of squash NADH:nitrate reductase and its relationship to the sequences of other flavoprotein oxidore- 
ductases. A family of flavoprotein pyrkiine nucleotide cytochrome reductases. 
Hyde GE. Crawford NM. Campbell WH; 

J Biol Chem 1991;266:23542-23547. 
[0379] 106. Cytidylyltrans 
Phosphatidate cytidylyltransf erase 
Number of members: 21 

[0380] Phosphatidate cytidylyltransferase (EC 2.7.7.41 ) (1 .2.3] (also known as CDP-diacylglycerol synthase) (CDS) 
is the enzyme that catalyzes the synthesis of CDP-diacylglycerol from CTP and phosphatidate (PA). CDP-diacy Iglycerol 
Is an important branch point intermediate in both prokaryotic and eukaryotic organisms. CDS is a membrane-bound 
enzyme. A consen/ed region kx:ated in the C-terminal part has been selected as a signature pattern. 

- Consensus pattern: S-x-[LIVMFJ-K-R-x(4)-K-D-x-(GSA]-x(2)-ILI]-(PGJ-x-H-G-G-[LIVMI-x-D-R-[LIVMFl-D 

[ 1] Sparrow CP. Raetz C.R.H. 

J. Biol. Chem. 260:12084-12091(1985). 
{ 2) Shen H., Heacock PN.. Clancey C.J., Dowhan W. 

J. Biol. Chem. 271:789-795(1996). 
[ 3J Saito S., Goto K.. Tonosaki A., Kondo H. 

J. Biol. Chem. 272:9503-9509(1997). 
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Number of members: 64 

(M83] 108. (cNMP binding) Cyclic nucteotide-binding domain signatures and profile Proteins that hinrf „ . 
otKtes (CAMP or cGMP) share a stmctural domainot about 120 riidues fl^l ^1^1^^^^^ ^ 
U.ep«.fcaryot« catabolrte gene activator (also Known as the cA^^^^2u^:'SXroZT^: 

isknowntobecomposedolthreealpha-helicesandadistlnctlveeight-slrLted antlDSelS te^L^. 

a domain is known to exist in the following proteins: - -s'ranaeo. anuparanel beta-barrelstructure. Such 

Proteryotic ^lite gene activator protein (CAP). - cAMP- and cGMP^Jependent protein kinases (cAPK and cGPK^ 

Both types of tanases contains two tandem copies of the cycfic nucleotide-binding domab^ 'meSJ^Ki 

Of nvodWeren, subunits: a catalyUc chah and a reguNory chain which containZtS^^S^^^^rxS^ 

Tc^raropsr^ors^^^^ 

cGPK is an alanine in rnost CAPK.- Vertebrate cTcli^ucSS^t^f^t^^^^ a threonine that, is invariant in 

First consensus pattern: IUVMHVIC]-x(2)-G4DENQTA]-x-[GAC]-x(2)>[LIVMFYJ(4)-x(2)-G 
second consensus pattern: ILtVMF].G.E-x-lGASHUVMhx(5J1^ 

[ 1] Weber I.T., Shabb J.B., Corbin J.D. Biochemistry 28:6122-6127(1989) 

[ 2] Kaupp U.B. Trends Neurosci. 14:150-157(1991). 

1 31 Shabb J.B.. Corbin J.D. J. Bfol. Chem. 267:5723-5726(1992). 

[0384] 109. (cadherin) 

Cadherins extracellular repeated donr^in signature 

- Epithelial (E-cadherin) (also known as uvomorulin or L-CAM) (CDHl) 

- Neural (N-cadherin) (CDH2). / v 

- Placental (P-cadherin) (CDH3). 

- Retinal (R-cadherin) (CDH4). 

- Vascular endothelial (VE-cadherin) (CDH5). 

- Kidney (K-cadherin) (CDH6). 

- Cadherin-8 (CDH8). 

- Osteoblast (OB-cadherrn) (CDH 1 1 ). 

- Brain (BR-cadherin) (CDH 1 2). 

- T-cadherin (tmncated cadherin) (CDH 13). 

- Muscle (M-cadherin) (CDH14). 

- Liver-intestine (U-cadherin). 
EP-cadherin. 

!^sidr.:ranraTi^^^^ 

terminal cytoplasmic domain of about ISOre^dur^^^rf^SiM I •'^"^^'"brane reg«n. and finally a C- 
arefourrepeatsof about llOresidSilfS^Sra l^oi^Zl^^^"" ^"''^ "'^ "^"^ '''"^ 
;^.iurn-.i„.,g ,egk. of cadherinristStSr„.rrer;C^^^^ " 

Kvo?eitsr.rj^^^^^ 
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- Desmoglein 1 (desmosomal glycoprolein I). 
Desmoglein 2. 

Desmoglein 3 (Pemphigus vulgaris antigen). 

[0387] The Drosophila fat protein [3] is a huge protein of over 5000 amino acids that contakis 34 cadherin-Jike repeats 
in its extracellular domain. 

[0388] The signature pattern that was developed for the repeated domain Is located in it the C-terminal extremity 
which Is its best conserved region. The pattern includes two conserved aspartic add residues as well as two aspar- 
agrnes; these residues could be Implicated In the binding of calcium. 

[0389] Consensus pattem[LI V]-x-[LI V]-x-D-x-N-D-[NH]-x-P Sequences known to betong to this class detected by the 
pattern ALL Note this pattern is found in the first, second, and fourth copies of the repeated domain. In the third copy 
there is a deletk>n of one residue after the second cor^en/ed Asp. 

[ 1) Takelchi M. Annu. Rev. Blochem. 59:237-252(1990). 
[ 2) Takelchi M. Trends Genet. 3:21 3-21 7(1 987). 

{ 3) Mahoney PA., Weber U., Onofrechuk P. Biessmann H., Bryant PJ., Goodman C.S. Cell 67:853-868(1991). 
[0390] 1 1 0. Calretk:ulin family signatures 

Calreticulin [1] (also known as calregulin, CRP55 or HACBP) is a high^apacltycalcium-blnding protein which is present 
in nrKJSt tissues and located at the periphery of the endoplasmic (ER) and the sarcoplamic retculum (SR)membranes. 
It probably plays a role in the storage of calcium In the lumen ofthe ER and SR and it may well have other important 
f unctbns. Structurally, calretfeulln is a protein of about 400 amino acid residues consisting of three domains: a) An N- 
terminal. probably globular, domain of about 180 amino acid residues (N-domain); b) A central domain of about 70 
residues (P-domain) whk:h contains three repeats of an acidic 17 amino acid motif. This regbn binds cateium with a 
low-capacity, but a high-afTinity; c) A C-terminal domain rich in acidic residues and in lysine (C-domain). This region 
binds calcium with a high-capacity but a tow-affinity. Calretk:ulin is evolutionary related to the following proteins: - 
Onchocerca volvulus antigen RAL-1 RAL-1 is highly similar to calretksulin, but possesses a C-terminal domain rfch in 
lysine and arginine and lacks acidk: residues and is therefore not expected to bind calcium in that region. - Calnexin 
12]. A cateiunr>-binding protein that interacts with newly synthesized glycoproteins in the endoplasmic reticulum. It seems 
to play a major role in the quality control apparatus of the ER by the retention of incorrectly folded proteins. - Calmegin 
[3] (or calnexin-T), a testis-specific calcium-binding protein highly similar to calnexin. Three signature patterns have 
been developed for this family of proteins. The first two pattems are based on consented regions in the N-domain; the 
third patlem corresponds to positions 4 to 16 of the repeated motif in the P-domain. 

Consensus pattern: IKRHNhx-iDEQN]-[DEQNK]-x(3)-C-G-G-[AGh[FY]-[LIVMHKN]-IUVMFY]{2). 

Consensus pattern: (LIVM](2)-F-G-P-D-x-C-[AGl- 

Consensus pattern: [IVl-x-D-x-[DENST]-x(2)-K-P-[DEH]-D-W-[DENl- 

[ 1] Michalak M., Milner RE., Bums K., Opas M. Blochem. J. 285:681-692(1992). 

[ 2] Bergeron J.J.M.. Brenner M.B., Thomas D.Y., Williams D.B. Trends Biochem. Scl. 19:124-128(1994). 

[ 3] Watanabe D.. Yamada K.. Nishina Y, Tajima Y., Koshimizu U.. Itogata A.. Nishlmune Y. J. Bk>l Chem 269' 

7744-7749(1994). 

[0391] 111. Eukaryotic-type carbonic anhydrases signature (carb_anhydrase) 

Carbonic anhydrases (EC 4.2.1.1 ) (CA) (1,2.3.4] are zinc metalloenzymes which catalyze the reversible hydration of 
carbon dioxide. Eight enzynriatk: and evolutionary related forms of cart)onk: anhydrase are currently known to exist in 
vertebrates: three cytosolk: isozymes (CA-I. CA-II and CA-III); two membrane-bound forms (CA-IV and CA-VII); a 
mitochondrial fonn (CA-V); a secreted salivary form (CA-VI); and a yet uncharacterized isozyme [SJ In the alga 
ChlamydonrK)nas reinhardtii. two CA isozymes have been sequenced[6]. They are periplasmic glycoproteins evolu- 
tionary related to vertebrate CAs. Some bacteria, such as Neisseria gonorrhoeae [7] also have a eukaryotk;-type CA. 
CAs contain a single zinc atom bound to three conserved histidlne residues. As a signature for CAs. a pattern has 
been developed which includes one of these zinc-binding histkJines. Protein D8 from Vaccinia and other poxviruses is 
related to CAs but has tost two of the zinc-binding histidines as well as many otherwise consen/ed residues. This is 
also true of the N-tenminal extracellular domain of some receptor-type tyrosine-protein phosphatases (see 
<PDOC00323>). 

Consensus pattern: S-E-(HN]-x-(LIVMl-x(4)-[FYHl-x(2)-E-|LIVMGA].H.[LIVMFA](2) [The second H is a zinc ligand]- 
Note: most prokaryotic CA's as well as plant chloroplast CA's belong to another, evolutionary distinct fami^ of proteins 
(see <PDOC00586 
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[ IJ Deutsch H.F. Int J. Biochem. 19:101-113(1987). 

1 2) Femley RT. Trends Biochem. Sci. 13:356-359(1988). 

[ 3] Tashian R.E. BioEssays 10:186-192(1989). 

[ 4] Edwards Y Biochem. Soc. Trans. 18:171-175(1990). 

[ 5] Skaggs LA., Bergenhem N.C.H., Venta P.J., Tashian RE. Gene 126:291-292(1993). 

[ 6J Fujiwara S.. Fukuzawa H., TachikI A., Miyachi S. Proc. Natl. Acad. Sci. U.S.A. 87:9779-9783(1990), 

[ 71 HuangS., Xue Y. Sauer-Eriksson E., Chirica L. Undskog S.^Jonsson B.H. 2.3.C0.2-'J Mol Biol PQa ani-!^in 

f1998>. " '' 



[0392] 112. Caseins alpha/beta signature 

Caseins [1] are the major protein constituent of milk. Caseins can be classified mXo two famines; the first consists of 
the kappa-caseins, and the second groups the atpha-sl, alpha-s^ and beta-caseins. The alpha/beta caseins are a 
rapidly diverging family of proteins. However two regions are conserved: a cluster of phosphorylaled serine residues 
and the signal sequence. The signature pattem has been developed for this family of proteins based upon the last 
eight residues of the signal sequence. 
Consensus pattern: C-L-fLV}-A-x-A-[LVF}-A- 
[1] Holt C. Sawyer L Protein Eng. 2:251-259(1988). 
[0393] 1 1 3. Catalase signatures 

Catalase (EG 1.11. 16) [1 ,2,3J is an enzyme, present in all aerobic cells,that decomposes hydrogen peroxide to mo- 
lecular oxygen and water. Its main function is to protect cells from the toxic effects of hydrogen peroxide. In eukaryotic 
organisms and in some prokaryotes catalase is a molecule composed of four kientk^al subunils. Each of the subunrts 
binds one protoheme IX group. A consented tyrosine serves as the heme proximal skte ligand. The region around this 
residue has been used as a first signature pattem; it also includes a consented arginine that participates in heme- 
binding. A consented histidine has been shown to be important for the catalytic mechanism of the enzyme. The region 
around this residue has been selected as a second signature pattem. - 

Consensus pattem: R-(LI VMFSTAN]-F-[GASTNP]-Y-x-D-[AST].[QEH] (Y is the proximal heme-binding ligand] 
Consensus pattem: [IF].x-[RH].x{4)-[EQ]-R.x(2)-H-x(2HGASHGASTF]-{GAST] [H is an active site residue] 
Note: some prokaryotk: catalases belong to the peroxidase family (see <PDOC00394>). 

1 1] Murthy M.R.N., Reki T J. Ill, Sicignano A., Tanaka N., Rossmann M.G. J. Mol. Biol. 152 465-499(1981) 

[ 2J Melik-Adamyan W.R., Barynin V.V.. Vagin AA.. Borisov V. V.. Vainshtein B.K.. Fita I.. Murthy M.R.N., Rossmann 

M.G. J. Mol. Biol. 188:63-72(1986). 

[ 3] von Ossowki I.. Hausner G., Loewen RC. J. Mol. Evol. 37:71-76(1993). 
[0394] 114. (chitin binding) Chitin recognitbn or binding domain signature 

A consented domain of 43 amino ackte is found in several plant and fungal proteins that have a common binding 
specificity for oligosaccharides of N-acetylglucosamine [1]. This domain may be involved in the recognition or binding 
of chitin subunrts. It has been found in the proteins listed below - A number of non-leguminous plant lectins The best 
characterized of these lectins are the three highly homologous wheat germ agglutinins (WGA-1. 2 and 3) WGA is an 
N-acetylglucosamine/N-acetylneuraminic ackJ binding lectin whfch structurally consists of a fourfold repetitton of the 
43 ammo acid domain. The same type oi stmcture is found in a barley root-specific lectin as well as a rice lectin - 
Plants endochitinases (EC 3.2. 11 4) from class I A (see <PDOC00620 >). Endochitinases are enzymes that catalyze 
the hydrolysis of the beta-1 A linkages of N-acetyl glucosamine polymers of chitin. Plant chltlnases function as a defense 
against chitin containing fungal pathogens. Class lA chitinases generally contain one copy of the chitin-bbding domain 
at their N-terminal extremity. An exception is agglutinin/chitinase [2] from the stinging nettle Urtica dioica which contains 
two copies of the domain. - Hevein (5), a wound-induced protein found in the latex of rubber trees. - Wini and win2 
two wound-induced proteins from potato. - Kluyveromyces lactis killer toxin alpha subunit [3]. The toxin encoded by 
the Imear ptasmid pGKLI is composed of three subunits: alpha, beta, and gamma. The gamma subunit harbors toxin 
activity and inhibits growth of sensitive yeast strains in the G1 phase of the cell cycle; the alpha subunit. whk:h is 
proteolytically processed from a larger precursor that also contains the beta subunit. is a chitinase (see <PDOC00839 >> 
In chitinases. as well as In the potato wound-induced proteins, the 43-residuedomaln directly follows the signal se- 
quence and is therefore at the N-terminal of the mature protein; in the killer toxin alpha subunit it is kx:ated in the central 
section of the protein. The domain contains eight consented cysteine residues which have all been shown in WGA 
to be involved in disulfide bonds. The topological arrangement of the four disulfide bonds is shown in the foltowlna 
figure: + h+ — | +|ini ^ 

.t_.- ■■ ... . . ... 

I — ^ ^ ^.^.^^ conserved cysteine in- 
volved in a disulfide bond.'**: position of the pattern. 
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- Consensus pattern: C-x(4,5)-C C-S-x(2)-G-x-C-G-x(4)-lFYW]-C [The five C's are involved in disulfide bonds] 

[ 11 Wright H.T., Sandrasegaram G., Wright C.S. J. Md. Evol. 33:283-294(1991). 
[ 2] Lerner D.R.. Raikhel N.V. J. Biol. Chem. 267:11085-11091(1992). 
5 j aj Butler A.R.. ODonnel R.W., Martin V.J., Gooday G.W., Stark M.J.R Eur. J. Biochem. 199:483-488(1991). 

[0395] 115. (Chltinase 1 ) Chitinases family 1 9 signatures 

Chitinases (EC 3.2.1.14) (1 ] are enzymes that catalyze the hydrolysis of thebeta-1 ,4-N-acetyl-D-glucosamine linkages 
in chllin polymers. From the viewpoint of sequence similarity chitinases belong to either family 18 or 19 in the classi- 

10 ficatbn of glycosyl hydrolases [2,E1]. Chitinases of family 19(also known as classes lA or i and IB or 11) are enzymes 
from plants that function In the defense against fungal and insect pathogens by destroying their chitin-containing cell 
wall. Class I A/I and IB/ll enzymes differ in the presence (lA/l) or absence (IB/II) of a N-temriinal chitin-binding domain 
(seethe relevant entry <PDOC00025>). The catalytw domain of these enzymes consist of about 220 to 230 amino ackJ 
residues. Two highly consen/ed regions have been selected as signature patterrts. the first one is located in the N- 

*5 terminal sectbn and contains one of the six cysteines whk:h are consented in most if not all, of these chitinases and 
which is probably involved in a disulfide bond. 

Consensus pattern: C'X(4.5)-F-Y-[ST]-x(3)-[FY]-[UVMF]-x-A-x(3)-[YF]-x(2)-F- [GSA] 
Consensus pattern: [LIVMJ4GSAJ-F-x-ISTAGJ(2)-(UVMFY]-W-[FY>W-[UVM] 

20 [ 1 J Flach J., Pilet R-E.. Jolles R Experientia 48:701 -71 6(1 992). 

[ 2] Henrissat B. Biochem. J. 280:309-316(1991). 

[0396] 1 1 6. chloroa_b-bind 

Chforophyll A-B binding proteins. Number of members: 211 
25 [0397] 117.chromo 

The 'chromo' (CHRromatin Organizatton Modifier) domain [1 to 4] is a consented region of about 60 amino acids whfch 
was originally found in Drosophlla modifiers of variegation, which are proteins that modify the stmcture of chromatin 
to the condensed morphology of heterochromatin, a cytologically visible condition where gene expression is repressed. 
In protein Polycomb, the chromo domain has been shown to be important for chromatin targeting. Proteins that contains 
30 a chromo domain seem to fall into three classes: 

a) Proteins which have a N-terminal chromo domain foltowed by a region which is related to but distinct from the 
chromo domain and which has been termed [3] the 'chromo shadow* domain. 

b) Proteins with a single chromo domain. 

35 c) Proteins with paired tandem chromo domains. 

[0398] Currently, this domain has been found in the following proteins: 
[0399] Class A. 

^0 - Drosophila heterochromatin protein Su(var)205 (HPl). 
Human heterochromatin protein HPl alpha. 
Mammalian modifier 1 and modifier 2. 

- Fissbn yeast swi6, a protein involved in the repressk5n of the silent mating-type kx:i mat2 and mat3. 

45 [0400] Class B. 

Drosophila protein Polycomb (Pc). 
Mammalian modifier 3, a honrxjtog of Pc. 

- Drosophila protein Su(var)3-9, a suppressor of positton-effect variegation. 
^0 - Human Mi-2 autoantigen. characterisitic of dermatomyosis. 

Fungal retrotranposon polyproteins: 'sklpp/ from Fusarium oxysporum. 'grasshopper* and 'MAGGY* from Mag- 
naporthe grisea and CfT-l from Cladosporlum fulvum. 
Fission yeast hypothetical protein SpACl8G6.02c. 
Caenorhabditis elegans hypothetical protein C29H12.5 
55 - Caenorhabditis elegans hypothetical protein ZKl 236.2. 

- Caenorhabditis elegans hypothetical protein T09A5.8. 

[0401] Class C. 
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- Mammalian DNA-binding/helicase proteins CHD-1 to CHD-4 

- Yeast protein CHD1 . 

[0402] The signature pattern for this domain corresponds to Its best consenred section, which is located in i 



Its central 



- consensus pattem: IFYL^x^UVMCHKR^W-x^GDNRHFYWLME^x(5.6HS WSVHPSTDENhx^ 
[ 1J Paro R. Trends Genet. 6:416-421(1990) 

1S79^M99T'"""' ''''''' P««>"-'^»«T.C.. Gaunt S.J. Nucleic Acids Res. 

[ 3J Aasland R. Stevxart AF. Nucleic Acids Res. 23:3168-3173(1995). 

1 4J Koonin E.V., Zhou S.. Lucchesis J.C. Nucleic Acids Res. 23:4229^233{1995). 

[0403] 118. citrate_synt 

Citrate synthase (EC 4.1.3.7) (CS) Is the tricartwxylic acid cycle enzyme that catalyzes the synthesis of citrate from 
[0404] In prol^nMes. citrate synthase is composed of six identical subunits. In eukaryotes there are two iso™n«« 

" "^"^ ^« -mTot d^r" 

S!^?. ,t^ZT'^ ^ °' °' ^^'"^^ ^^^'^ Piokaryotic and eukaryotic citrate synthases 

S^hl "^flf'^^^T^ ^ which is one of three residues Jtown [1] to beXcLedTthe c^^^^^ 

mechanism of the vertebrate mitochondrial enzyme. This region has been used as a signahire » ^ 

- Consensus pattem: G-[FYAHeA]-H-x-[l V].x(l .2)-|RKT>x(2)-DHPS]-R (H is an active site residue] 

[040^^ m'^ctoA^^ ^ " ^'^^^"^ ^ • ^^"^^ S.J. Biochemistry 29:221 3-221 9(1 990). 

Chaperonin cIpA/B 

Number of members: 39 

' ^^^^.^^^'^^^f '^"'^'^'^ ^"^""^^ ATP-dependent protease cip 

- Rnodopseudomonas blastica clpA homotog. 

- Eschericha coli heat shock protein cipB and homologs in other bacteria 
Bacillus subtills protein mecB. 

- Yeast heat shock protein 1 04 (gene HSP104), which is vital lor tolerance to heat, ethanol and other stresses 

- Neurospora heat shock protein hsp98. «"ic«noi ana oiner stresses. 

- Yeast mitochondrial heat shock protein 78 (gene HSP78) [3] 

- CD4A and CD4b. two highly related tomato proteins that seem to be located in the chtoroolast 

- Trypanosoma brucei protein dp. cnioropiast. 

- Porphyra purpurea chloroplast encoded cIpC. 

ofthe ATP binding B motif. The second pattern « located h the second domain in^etween the ATP-binding A ^^B 

- Consensus pattem: D-(AIHSGA]-N4LI VMF](2)-K-1 PT]-x-L-x(2)-G 

• Consensus pattem: R-^UVMFYl-D-x-S-E-(LIVMFY]-x-E-IKRQ^x-[STAl-x-[STA]^KR)-(UVM^x-G.[STAJ 
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[ 1] Gottesman S., Squires C. Plchereky E.. Carrington M., Hobbs M., Matlick J.S.. Dalrymple B.. Kuramitsu H., 
Shiroza T. Foster T, Clark W.R. Ross B.. Squires C.L. Maurizi M.R. Proc. Natl. Acad. Sci. U.S.A. 87:3513-3517 
(1990). 

[ 2] Parsell D.A., Sanchez Y., StitzelJ.D., LIndquist S. Nature 353:270-273(1991). 

( 3J Leonhardt SA, Fearon K.. Danese PN., Mason T.L MoL Cefl. Biol. 13:6304-6313(1993). 

[0410] 120. cofilln.ADF 
CofilinAropomyosin-type actin-binding proteins 

MI 

Medline: 97290449 
Structure determination of yeast cofilin. 
Fedorov AA. Lappalainen P, Fedorov EV, Drubin DG» Aimo SC; 

Nat Struct Biol 1997;4:366-369. 

(2) 

Medline: 97290450 

Crystal structure of the actin-binding protein actophorin from Acanthamoeba. 
Leonard SA, Gittis AG. Petrella EC. Pollard TD. Lattman EE; 
Nat Struct Biol 1 997;4:369-373. 
[3] 

Medline: 97420794 

F-actin and G-actin binding are uncoupled by mutatbn of conserved tyrosine reskiues in maize actin depolymerizing 
factor. 

Jiang CJ. Weeds AG, Khan S, Hussey PJ; 
Proc Natl Acad Sci U S A 1997;94:9973-9978. 
14J 

Medline: 97357155 

Cofilin promotes rapki actin filament turrusver in vivo. 
Lappalainen P, Drubin DG; 
Nature 1997:388:78-82. 

Severs actin filaments and binds to actin monomers. 
Number of members: 44 

[0411] Actin-depolymerizing proteins sever actin filaments (F-actin) and/or bind to actin monomers, or G-actin, thus 
preventing actin-polymerization by sequestering the monomers. The following proteins are evolutionary related and 
belong to a family of k>w molecular weight (137 to 166 residues) actin-depolymerizing proteins 11.2.3.4]: 

Cofilin from vertebrates, slime moid and yeast. Cofilin binds to F-actin and acts as a pH-dependent actin-depo- 
lymerizing protein. 

Destrin from vertebrates. Destrin binds to G-actin in a pH-independent manner and prevents polymerization. 
Caenorhabditis elegans unc-60. 
Acanthamoeba castellanii actophorin. 
Plants actin depolymerizing factor (ADF). 

[0412] The most conserved region of these proteins is a twenty amino-acid segment that ends some 30 residues 
from their C-terminal extremity. This segment has been shown [5] to be important for actin-binding. 

- Consensus pattem: P-IDE}-x-[SAJ-x-[LIVMT}-[KRJ-x-[KR)-M-(LIVM)-[YAl-|STA](3)-x(3)-[LIVMFHKRl 

[ 1] Hawkins M., Pope B.. Maclver S.K.. Weeds A G. Bkx:hemistry 32:9985-9993(1993). 

[ 2) lida K.. Moriyama K., Matsumoto S., Kawasaki H., Nishida E.. Yahara I. Gene 124:115-120(1993). 

[ 3] Quirk S., Maclver S.K., Ampe C, Doberstein S.K., Kaiser D.A.. van Damme J.. Vandekerckhove J.. Pollard T. 

D. Bkxttemistry 32:8525-8533(1993). 

[ 4] McKim K.S., Matheson C, Marra M.A., Wakarchuk M.F, Baillie D.L Mol. Gen. Genet. 242:346-357(1994). 
1 51 Moriyama K.. Yonezawa N.. Sakai H., Yahara I.. Nishida E. J. Biol. Chem. 267:7240-7244(1992). 

[0413] 121. (Complex 24kd) Respiratory-chain NADH dehydrogenase 24 Kd subunit signature Respiratory-chain 
NADH dehydrogenase (EC 1.6.5.3) f 1.21 (also known as complexl or NADH-ubiquinone oxidoreduclase) is an oligo- 
mers: enzymatic complex kx:ated in the inner mitochondrial membrane which also seems to exist inthe chloroplast and 
in cyanobacteria (as a NADH-plastoquinone oxkjoreductase).Among the 25 to 30 polypeptide subunits of this bben- 
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ergetic oizyme comptec there » one with a molecular weight of 24 Kd (In mammals), which is a component of the iron- 
sulfur (IP) fragment of the enzyme. It seems to bind a2Fe-2S iron-sulfur cluster. The 24 Kd subunit is nuclear encoded 
as aprecursor form with a transit peptide in mammals, and in Neurospora ciassaThe 24 Kd subunit is highly similar 
1° ' ■ !lTrl °' ^^^"'^^ NADH-ubiquinone oxidoreductase (gene nuoE). - Subunit NQ02 of SLxcus 
denrtnficansNADH-ubiquinone oxidoreductase. A highly consenred region, located in the central section of this subunit 
containing two conserved cysteines that are probably irarokred in the binding of the 2Fe-2S center has been selected 
as a signature pattern. a«cw«su 

- Consensus pattern: D-x(2)-F-{S-n-x(5)-C-L-G-x-C-x(2) (6A}-P JThe two C's are putative 2Fe.2S ligands] 
( 1J Ragan C.I. Curr. Top. Bioenerg. 15:1-36(1987). 

( 2J Weiss H.. Friedrich T, Hofhaus G., Preis D. Eur. J. Biochem. 197:563-576(1991) 

1 3) Feamley I.M., Walker J.E. Biochrn. Biophys. Acta 1140:105-134(1992) 

1 4] Weidner U., Geier S.. Rock A.. Friedrich T. Leif K. Weiss H. J. Mol. Btol. 233:109-122(1993). 

[0414] 122. copper-bind 

Copper binding proteins, plastocyanin/azurin family 

Number of members: 70 

I04iq Blue or type-V copper proteins are small proteins which bind a single copper atom and which are character- 

are the plant chloroplast^ plastocyanins. which exchange electrons with cytochrone c6. and the distantly^etet^tec! 
S^Z r With cytochrome c551. This famT^ of proteins also includtlJ "fp^tJns 

listed bek>w (references are only provided for recently detemiined sequences). 

■ Amfcyanin from bacteria such as Methylobacterium extorquens or Thiobaclllus versutus that can grow on methvl- 
amine.Amicyanin appears to be an electron receptor for methylamlne dehydrogenase 

- Auracyanins A and B from Chloroflexus aurantiacus [3]. These proteins can donate elections to cytochrome c-554 

- Blue copper protein from Atealigenes faecalis. '"i-yiocnromecMA. 

- Cupredoxin (CPC) from cucumber peelings [4]. 

- Cusacyanin (basic blue protein; plantacyanin. CBP) from cucumber. 

- Halocyanin from Natrobacterium pharaohis [5], a membrane associated coMer-bindinq protein 

- Pseudoazurin from Pseudomonas. 

* TxiSTer «"sticyanin is an electron carrier from cytochrome c-652 to the a-type 

- Stellacyanln from the Japanese lacquer tree. 
Umecyanin from horseradish roots. 

' SStraS';^ "d'Cer""" "^''^ ''^^ '° 

[041 6] Although there is an appreciable amount of divergence in the sequence of all these proteins the coooer lioand 
sues are consented and a pattern which deludes two of the ligands (a cysteine and a histid'ine) SLs beenXK 

. Oo^^ensus pattern: {GA)-x(0.2)-(YSA]-x(0.1)-[VFY]-x-C-x(1,2)-[PGJ-x(0.1)-H-x(2.4HMQJ [C and H are copper lig- 

1 2i LG^HuSr^^^|' E^^^^^^^ ^^-■^^^■^^soss.y 

LTB^rc^r 2e';;S3f:^^^^ ^-^^ B-Kensh, R. 

( 4^Mann K.. Schaeter W., Thoenes U.. Messerschmidt A.. Mehiabian Z, Nalbandyan R. FEBS Lett. 314:220-223 

I 5) Matter S.. Scharf B.. Kent S.B.H.. Rodewald K.. Oesterhelt D.. Engelhard M. J. Biol. Chem. 269:14939-14945 
[ 61 Yano T.. Fukumori Y, Yamanaka T FEBS Lett. 288:159-162(1991). 
[0417] 123. Chaperonins cpnIO signature 

Chaperonins [1,2J are proteins invoked in the lokJing of proteins or the assembly of oligomerfc protein complexes. 
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They seem to assist other polypeptides in maintaining or assuming conformations which permit their con^ect assembly 
into oligomeric structures. They are found in abundance in prokaryotes, chloroplasts and mitochondria. Chaperonins 
form oligomeric complexes and are composed of two different types of subunits: a 60 Kd protein, known as cpn60 
(groEL in bacteria) and a 10 Kd protein, known ascpnIO (groES in bacterla).The cpnIO protein binds to cpn60 in the 
presence of MgATP and suppresses the ATPase activity of the latter. Cpn10 is a protein of about 100 amino acid 
residues whose sequence is well consen/ed in bacteria, vertebrate mitochondriaand plants chloroplast [3.4]. CpnIO 
assembles as an heptamer that forms a dome(51. As a signature pattern for cpnIO, a regbn kx:ated in the N-terminal 
sectk>n ol the protein was selected. 

Consensus pattern: ILIVI^FYl-x-P-[ILT>x-fDENHKR]-ILIVMFAl(3)-[KREQJ-x(8,9HSG]-x-[UVMFY](3)- 

Note: this pattern is found twbe in the plant chloroplast protein which consist of the taridem repeat of a cpn 10 domain 

[ 1] Ellis R.J.. van der Vies S.M. Annu. Rev. Biochem, 60:321-347(1991). 

[ 2) Zeilsta-Ryalls J.. Fayet O., Georgopoutos C. Annu. Rev. f^icrobk>l. 45:301-325(1991). 

( 3J Hartman D.J.. Hoogenraad N.J.. Condron R., Hoj P.B. Proc. Natl. Acad. ScL U.S.A. 89:3394-3398(1992). 

[ 41 Bertsch U.. Soli J., Seetharam R.. Virtanen RV. Proc. Natl. Acad. Sci. U.S.A. 89:8696-8700(1992). 

[ sj Hunt J.F.. Weaver A. J.. Landry S.J.. Gierasch L, Deisenhofer J. Nature 379:37-45(1996). 

[0418] 124. Chaperonins cpnOO signature (cpn60_TCP1 ) 

Chaperonins [1 ,2] are proteins involved in the fokiing of proteins or the assembly of oligomeric protein complexes. 
Their role seems to be to assist other polypeptides to maintain or assume conformations which permit their correct 
assembly intooligomerk; structures. They are found in abundance in prokaryotes. chloroplasts and mitochondria. Chap- 
eronins form oligomeric complexes and are composed of two different types of subunits: a 60 Kd protein, known as 
cpn60 (groEL in bacteria) and a 10 Kd protein, known as cpn10 (groES In bacteria).The cpn60 protein shows weak 
ATPase activity and is a highly conserved protein of about 550 to 580 amino acid residues which has been described 
by different names in different species: - Escherichia coli groEL protein, whk:h is essential for the growth of the bacteria 
and the assembly of several bacterk)phages. - Cyanobacterial groEL anak>gues. - Mycobacterium tubercuk)5ls and 
leprae 65 Kd antigen, Coxiella bumetti heat shock protein B (gene htpB), Rckettsia tsutsugamushi major antigen 58, 
and Chlamydial 57 Kd hypersensitivity antigen (gene hypB). - Chloroplast RuBisCO subunit binding-protein alpha and 
beta chains, which bind ribulose bisphosphate carboxylase small and large subunits and are implicated in the assembly 
of the enzyme oligomer. - Mammalian mitochondrial matrix protein PI (mitonin or P60). - Yeast HSP60 protein, a 
mitochondrial assembly factor. As a signature pattern for these proteiris, a rather well-consen/ed region of twelve 
residues, located in the last third of the cpn60sequence was chosen. 
Consensus pattern: A-[ASl-x-[DEQ]-E-x{4)-G-G-[GAl- 

[ 1] Ellis R.J., van der Vies S.M. Annu. Rev. Blochem. 60:321-347(1991). 

1 2] Zeilsta-Ryalls J., Fayet O., Georgopoulos C. Annu. Rev. Microbiol. 45:301-325(1991). 

[0419] Chaperonins TCP-1 signatures (cpn60_TCP1) 

The TCP-1 protein [1 .2] (Tailless Complex PolypeptkJe 1) was first identified in mice where it is especially abundant In 
testis but present in all cell types. It has since been found and characterized In many other mammalian species, In 
Drosophila and in yeast. TCP-1 Is a highly consen/ed protein of about 60 Kd (556 to 560 residues) which participates 
in a hetero-oligomerk:900 Kd double-toms shaped particle (3J with 6 to 8 other different subunits. These subunits. the 
chaperonin containing TCP-1 (CCT) subunit beta, gamma.delta, epsllon, zeta and eta are evoluttonary related to TCP- 
1 itself [4,5].The CCT is known to act as a molecular chaperone for tubulin, actin and probably some other proteins. 
[0420] The CCT subunits are highly related to archebacterial counterparts: - TF55 and TF56 [6], a molecular chap- 
erone from Sulfolobus shibatae, TF55 has ATPase activity, is known to bind unfolded polypeptides and forms a oligo- 
meric complex of two stacked nlne-membered rings. - Thermosome [7], from Thermoplasma acidophilura The ther- 
mosome Is composed of two subunits (alpha and beta) and also seems to be a chaperone with ATPase activity. It 
forms an oligomers complex of eight-membered rings. The TCP-1 family of proteins are weakly, but significantly [8], 
related to thecpn60/groEL chaperonin family (see <PDOC00268>). As signature patterns of this family of chaperonins, 
three conserved regkxis located in the N-terminal domain were chosen. 
Consensus pattern: [REEL]-[ST]-x-[LMFY]-G-P-x-[GSA]-x-x-K-[LIVMF](2)- 

Consensus pattern: [LIVM]-[TS]-[NK]-D-[GA]-[AVNHK]-ITAVl-(LIVM](2)-x(2)-[LIVMJ-x-(LlVM]-x-(SNHl.IPQH]- 
Consensus pattern: Q-[DEKJ-x-x-lLIVMGTA]-(GA]-D-G-T- 

[ 1] Ellis J. Nature 358:191-192(1992). 

[ 2) Nelson R.J., Craig E.A. Curr. Biol. 2:487-489(1992). 

[ 3) Lewis V.A., Hynes G.M., Zheng D., Saibil H.. Willison K.R. Nature 358:249-252(1992). 
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[ 41 Kubota K. Hynes G., Came A.. Ashworth A.. Willlson ICR Cum Biol. 4:89-99(1994) 

1 5) Kim a. Wilfison K.a, Norwich A.L Trends Biochem. Sci, 20:543-548(1994). 

[ 6J Trent J.D., Nimmesgem E. W^II J.S.. HartI F.U., Horwich A,L Nature 354:490-493(1991). 

( 7]V\feldmannT. Lupas A.. Kellermann J.. Peters J.. BaumelsterW. BioL Chem. Hoppe-Seyler376H 9.126(1995) 

[ 8] HemmingsenS.M. Nature 357:650-650(1992). 

[04211 125. cyclin (Cycllns) 

The cycllns Include an internal duplication, which is related to that found in TFIIB and the RB protein. 
Medline: 94203808 

Evidence for a protein domain superfamily shared by the cycllns. 
TFIIB and RB/pIO?. 
Gibson TJ. Thompson JD, Blocker A, Kouzarides T; 
Nucleic Acids Res 1994;22:946-952 
[2] 

Medline: 96164440 
The crystal structure of cycln A 
Brown NR. Noble MEM, Endlcotl JA. Garman EF. Vtokatsuki S, 
Mitchell E, Rasmussen B. Hunt T. Johnson LN; 
Structure. 1995;3:1235-1247. 
Complex of cyclin and cyclin dependant kinase. 
[3] 

Medline: 96313126 

Structural basis of cyclin-dependant kinase activation by phosphorylation. 
Russo AA. Jeffrey PD. Pavletich NP; 
Nat Struct Biol. 1996;3:696-700. 
Cyclins regulate cyclin dependant kinases (CDKs). 

The most divergent prosite members have been Included. Swiss:P22674 the Uracil-DNA gVcosylase 2 is the hiahest 
noise and may be related but has rK>t been included. 
Number of members: 169 

[04221 Cycllns [1,2.3] are eukaryotic proteins which play an active role in controlling nuclear cell division cycles 
Cyclins. together with the p34 (cdc2) or cdk2 kinases, form the Maturation Promoting Factor (MPF) There are two 
main groups of cyclins: 

- G2/M cyclins. essential for the control of the cell cycle at the G2/M (mitosis) transition. G2/M cyclins accumulate 
steadily dunng G2 and are abruptly destroyed as cells exit from mitosis (at the end of the M-phase). 

- G1/S cyclins. essential for the control of the cell cycle at the G1/S (start) transitk)a 

P)4231 in most species, there are multiple fomis of G1 and G2 cyclins. For example. In vertebrates, there are two 

G2 cyclrns, A and B. and at least three G1 cyclins. C. D. and E. 

[04241 A cyclin homolog has also been found in herpesvims salmiri [4J. 

[04251 The best consented region Is In the central part of the cyclins' sequences, known as the 'cyclin-box' From 
this, a 32 residue pattern has been derived. 

- Consensus pattem: R x(2)-[UVMSA]-x(2)-[FYWSHLIVM].x(8).|LIVMFC]-x(4HLIVMFYA).x(2).ISTAGCl-[UV 
FYQJ.x-[LIVMFYCHUVMFY]-D-[RKHHLIVMFYWI 

( 1J Nurse P. Nature 344:503-508(1990). 

[ 2] Norbury C, Nurse R Curr. Biol. 1:23-24(1991). 

[ 3] Lew D.J.. Reed S.I. Trends Cell Biol. 2:77-81(1992). 

1 4] Nicholas J.. Cameron K.R.. Honess R.W. Nature 355:362-365(1992). 

[04261 126. Cystatin domain 

This is a very diverse family. Attempts to define separate subfamilies have failed. Typically, either the N-terminal or C- 
termrnal end is very divergent. But splitting into two domains would make very short families. Cathelicidins are related 
to this family but have not been included. Number of members: 147 

[0427] Inhibitors of cysteine proteases 1 1 .2.3). whk:h are found in the tissues and body fluids of animals in the lan^a 
of the worm Onchocerca volvulus [4J. as well as in plants, can be grouped into three distinct but related families: 
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Type 1 cystatins (or stefins), molecules of about 100 amino acid residues with neither disulfide bonds rK>r carbo- 
hydrate groups. 

Type 2 cystatins, molecules of about 115 amino acid residues which contain one or two disulfide loops near their 
C-terminus. 

5 - Kininogens, which are multifunctional plasma glycoproteins. 

[0428] They are the precursor of the active peptide bradykinin and play a role In blood coagulation by helping to 
position optimally prekallikrein and factor XI next to factor XII. They are also inhibitors of cysteine proteases. Structurally; 
kininogens are made of three contiguous type-2 cystatin domains, folk3wed by an additional domain (of variable length) 
10 whbh contains the sequence of bradykinin. The first of the three cystatin domains seems to have k>st its inhibitory 
activity. 

[0429] In all these inhibitors, there is a conserved region of five residues which has been proposed to be important 
for the binding to the cysteine proteases. The consensus pattern starts one reskiue before this conserved region. 

IS - Consensus pattern: (GSTEQKRVJ-Q- ILIVTHVAF]-[SAGQJ-G-x-[LIVMNK]-x(2)-(UVMFYl-x-[LIVMFYA)-IDEN- 
QKRHSIVl 

(1] Barrett AJ. Trends Biochem. Sci. 12:193-196(1987). 
[2] Rawlings N.D., Barrett A. J. J. Mol. Evol. 30:60-71(1990). 
20 [3] Turk v.. Bode W. FEBS Lett. 285:213-219(1991 ). 

(4) Lustigman S., Brotman B.. Huima T. Prince A.M. Mol. Biochem. Parasrtol. 45:65-76(1991). 

[0430] 127. cytochromo_c (Cytochrome c) 
The Pfam entry does not include all prosite members. 
2S The cytochrome 556 arKJ cytochrome c' families are not included. 
Number of members: 259 

[0431] In proteins betonging to cytochrome c family [1 J, the heme group is covalently attached by thioether bonds to 
two conserved cysteine residues. The consensus sequence for this site is Cys-X-X-Cys-His and the histidine residue 
is one of the two axial ligands of the heme iron. This arrangement is shared by ail proteins known to belong to cyto- 
30 chrome c family, which presently includes cytochromes c, c', c1 to c6, c550 to c556, cc3/Hmc, cytochrome f and reaction 
center cytochrome c. 

- Consensus pattern: C-{CPWHF}-[CPWR}-C-H-{CFYW) 

35 [0432] [ 1] Mathews F.S. Prog. Biophys. Mol. Biol. 45:1-56(1985). 

[0433] 1 28. (DAGKa) Diacylglycerol kinase accessory domain (presumed) 

[0434] Diacylglycerol (DAG) is a second messenger that acts as a protein kinase C activator. This domain is assumed 
to be an accessory domain: its furu:tk>n is unknown. 

[0435] [1] Sakane F, Yamada K. Kanoh H, Yokoyama C. Tanabe T, Nature 199a,344:345-348.[2) Sakane F, Imai S. 
40 Kal M, Wada I, Kanoh H, J Biol Chem 1 996;271 :8394-8401 .[3] Schaap D, de Widt J, van der Wal J, Vandekerckhove 
J. van. Damme J, Gussow D, Ptoegh HL, van Blrtterswijk WJ, van der. Bend RL. FEBS Lett 1990:275:151-158. [4] 
Kanoh H, Yamada K, Sakane F, Trends Biochem Sci 1990;15:47-50. 
[0436] 129. (DAGKc) Diacylglycerol kinase catalytic domain (presumed) 

[0437] Diacylglycerol (DAG) is a second messenger that acts as a protein kinase C activator. The catalytic domain 

45 is assumed from the finding of bacterial homologues. 

[0438] [1] Sakane F, Yamada K, Kanoh H, Yokoyama C. Tanabe T, Nature 1990;344:345-348. [2] Sakane F, Imai S. 
Kal M. Wada I, Kanoh H, J Biol Chem 1996;271:8394-8401. [3] Schaap D. de Widt J. van der W^l J, Vandekerckhove 
J, van, Damme J, Gussow D. Ptoegh HL, van Blitterswijk WJ, van der. Bend RL. FEBS Lett 1990;275:151-158. (4) 
Kanoh H, Yamada K, Sakane F, Trends Bkxjhem Sci 1990;15:47-50. 

so [0439] 130. D-aminoackioxklases signature(DAO) 

[0440] D-amino acid oxidase (EC 1.4.3.3 ) (DAMOX or DAO) is an FAD flavoenzyme that catalyzes the oxidation of 
neutral and basic D-amino acids into their corresponding keto acids. DAOs have been characterized and sequenced 
in fungi and vertebrates where they are known to be kx^ated in the peroxisomes. D-aspartate oxidase (EC 1.4.3.1 ) 
(DASOX) [1] is an enzyme, structurally related to DAO, which catalyzes the same reactkMi but is active only toward 
dicarboxylk^ D-amino acids. In DAO, a consented histkiine has been shown [2] to be important for the enzyme's catalytk: 
activity. The conserved region around this residue has been devebped as a signature pattern for these enzymes. 
[0441] Consensus panem: |U VM](2)-H-(NHA]-Y-G-x-[GSA](2)-x-G-x(5)-G-x-A [H is a probable active site reskjuejo- 
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! o! HZ* ^f"'/- T«Jeschi G.. Simonb T. Ronchi S. J. Biol. Chem. 267:11865-11871(1992) 

[ 2] Mj«no M.. FuKu. K.. «fe,anabe F.. TaKahashi S.. Tada M.. Kanashiro M.. M^Ke Y. a. ^^^09:171-177 

[0442] 131. DEAD and DEAH box families ATP-dependent heRcases signatures 

A number of eukaryotic and prokaiydic proteins have been characterijed ii ? "ii /« .k» k . ... • 

ATPase and DNA-helicase activities in vitro It fe iSd t^^-'jJJ' ^"^J"^^ P68 has 

putative RNA helicase related to p68 - DBP2 a veT^Sf ii^^Si^ nS^*"" * ^"^^ ^'^^^ ^ ^rosophila 
protein Involved in rtoosome ass2*ly MM^nl^o^il!!^!?^ ' ^ ' ^^^l . a yeast 

ROK1. a yeas, protein, -steia. a fissi^^yrS^en^roZ^ffaorS^^^^^^ m''"''^ P'^^""^" * 

specification of embryonic posterior structures Mesi B l^r'^^SS^!, ^ 1 ^ ' ^"^ 

lion. - dbpA. an Escf;e,ichia«>li puta rRNA heTise Z^fj^^T^ T^""^ ""^'"^ 

suppress a mutation in the r^ene for r^^^olersz ^S Z^S'^T""^ '^'^ '""^''^ 

rhlE. an Escherichia coli putat^e RNA h JJrasl^Ji Ln P^htii* T Es«*,enchB coli putative RNA helicase. - 

activity. It probably interacts^th 23S r^^i^A^^ f^^^ 

hypothetical protein SpAC31A2.07c. - Bacillus subtilis hypothetical oroteTwiN^i^htrn^ k " 

pu;*, JL,'^. »«:r.re rzr ■'"^•"^'^■^^•»» 

[0443] Consensus pattern: [LI VMF](2)-D-E-A-E>[RKEN]-x-[LIVMFYGSTN 
Consensus pattern: [GSAH]-x-[LIVMF](3)-D-E-IALIVl-H-[NECRj 

Note: proteins belonging to this famHy also contain a copy of the ATP/GTP- htnriinn m^tif .a. /d , 

entry <PDOC00017 K/t»TP- binding motif A' (P-loop) (see the relevant 

[ 1) Schmid S.R., Under R Mol. Microbiol 6 283-292(1992) 

[ 2) L^der P.. Usto P.. Ashbumer M.. Leroy P.. Nielsen P.J.. Nishi IC. ^^^^^^ 

I 3] Wassarman DA. Steitz J.A. Nature 349:463-464(1991) 

[ 4] Hodgman T.C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata) 

[ 5J Harosh I.. Deschavanne R Nucleic Acids Res. 1 9:6331 -6331 (i 99 1) 

[ 6J Koonin E.V., Senkevich T.G. J. Gen. Virol. 73:989-993(1992). 

K Ifn'iT'^T"'^ 3.4-dihydroxy-2-butanone 4.phosphate synthase 

r04471 1 rr^uADcf ^ ^ ' ^ ^acher A. Methods Enzymol 1 997280-374-382 

[0447] 133. (DHDPS)Dihydrodipicolinate synthetase signatures »»'^ew.j/4 382. 

D-iydrodipicolinate synmetase (EC 4Jj^) (DHDPS) ,11 catalyzes, in higher plants chloroptest and in many bacteria 



EP 1 033 405 A2 

(gene dapA), the first reaction specific to the biosynthesis of lysine and of diamirK>pimetate. DHDPS is responsible for 
the condensation of aspartate semialdehyde and pyruvate by aping-pong mechanisnr) in which pyruvate first binds to 
the enzyme by forming a Schiff-base with a lysine residue. Three other proteins are structurally related to DHDPS and 
probably also act via a similar catalytic mechanism: - Escherichia coli N-acetylneuraminate lyase (EC 4.1.3.3 > (gene 

s nanA), which catalyzes the condensation of N-acetyl-D-mannosamine and pyruvate to form N-acetylneuraminate. - 
Rhizobium melibti protein nx>sA [3], which is involved in the biosynthesis of the rhizopine 3<Hnethyl-scytto-inosamtne. 
- Escherichia coti hypothetical protein yjhH. Two signature patterns for these enzymes were developed . The first one 
is centered on highly conserved region in the N-terminat part of these proteins. The second signature contains a lysine 
residue which has been shown, in Escherichia coli dap A [2], to be the one that forms a Schiff-base with the substrate. 

10 [0448] Consensus pattern: (GSAHUVMHUVMFYl-x(2)-G-{ST]-[TG]-G-E-[GASNF]-x(6HEQ]- 

Consensus pattern: Y-[DNSHLIVMFAhP-x(2HSTl-x(3HUVMG]-x(13,14HUVMJ- x-[SGAHLIVMF]-K-[DEQAFl- 
[STAC] [K is involved in Schlff-base formation]- 

1 1] KanekoT. Hashimoto T. Kumpaisal R. Yamada Y J. Biol. Chem. 265:17451-17455(1990). 
*5 [ 2] Laber B.. Gomis-Rueth F.-X.. Romao M J., Huber R. Biochem. J. 288:691-695(1992). 

1 3] Murphy RJ., Trenz S.R, Grzemski W.. de Bruijn FJ.. Schell J. J. Bacteriot. 175:5193-5204 (1993). 

[0449] 134. (DHOdehase) Dihydroorotate dehydrogenase signatures 

Dihydroorotate dehydrogenase (EC 1.3.3.1) (DHOdehase) catalyzes the fourth step in the de novo biosynthesis of 

20 pyrimidine. the conversion of dihydroorotate into orotate. DHOdehase is a ubiquitous FAD flavoprotein. In bacteria 
(gene pyrD), DHOdease is located on the inner side of the cytosolic membrane. In some yeasts, such as in Saccha- 
romyces cerevisiae (gene URA1). It is a cytosolic protein while in other eukaryotes It is found In the mitochondria [1]. 
The sequence of DHOdease is rather well conserved and two signature patterns were developed specific to this en- 
zyme. The first corresponds to a region in the N-terminal section of the enzyme while the second is located in the C- 

25 terminal section and seems to be part of the FAD-binding domain. 

Consensus pattern[GS]-x(4)-IGK]-[GSTA]-[LIVFSTA]-IGT|-x(3)-{NQR]-x-G-[NHYJ-x(2)-P-[RTl 
[04501 Consensus pattem[LIVI^](2)-[GSAJ-x-G-G-(IV]-x-[STGDNJ-x(3)-[ACVl-x(6)-G-A 
[0451] [ 1] Nagy M.. Lacroute F, Thomas D. Proc. Natl. Acad. Scl U.S.A. 89:8966-8970(1992). 
[0452] 135. (DMRL^synthase) 6,7-dimethyl-8-ribityIlumazine synthase 

30 [04S3] 136. (DNA_methylase) C-5 cytosine-specific DNA methylases signatures 

C-5 cytosine-specific DNA methylases (EC 2. 1.1. 73 ) (C5 Mtase) are enzymes that specifically methylate the C-5 carbon 
of cytosines in DNA {1»2.3]. Such enzymes are found in the proteins described below. - As a component of type II 
restriction-modification systems in prokaiyotes and some bacteriophages. Such enzymes recognize a specific DNA 
sequence where they methylate a cytosine. In doing so, they protect DNA from cleavage by type II restriction enzymes 

35 that recognize the same sequence. The sequences of a large number of type II C-5 Mtases are known. - In vertebrates, 
there are a number of C-5 Mtases that methylate CpG dinucleotides. The sequence of the mammalian enzyme is 
known.C-5 Mtases share a number of short conserved regions. Two of them were selected. The first is centered around 
a consen/ed Pro-Cys dipeptide in whch the cysteine has been shown [4] to be involved in the catalytic mechanism; it 
appears to form a covalent intermediate with the C6 position of cytosine. The second region is located at the C-terminal 

40 extremity in type-ll enzymes 

[0454] Consensus pattern: IDENKS]-x-[FLI V]-x{2)-(GSTChx-P-C-x(2)-[FYWLIM]-S [C is the active site residuej- 
Consensus pattern: [RKOGTFl-x(2)-G-N-[STAG]-[LIVMFl-x(3)-[UVMT]-x(3)-[LIVMl-x(3)-[LIVMJ- 

[ 1) Posfai J., Bhagwat A.S.. Roberts R.J. Gene 74:261-263(1988). 
45 1 2] Kumar S,. Cheng X., Klimasauskas S.. Mi S.. Posfai J., Roberts R.J.. Wilson G.G. Nucleic Acids Res. 22:1-10 

(1994). 

[ 3) Lauster R., Trautner TA., Noyer-Weidner M. J. Mol. Bbl. 206:305-312(1989). 

[ 4] Chen L. McMillan A.M.. Chang W., Ezak-Nipkay K., Lane W.S., Verdine G.L Biochemistry 30:11018-11025 
(1991). 

so 

[0455] 1 37. (DNAphotolyase) DNA photolyases class 2 signatures 

Deoxyribodipyrimidine photolyase (EC 4.1.99.3 ) (DNA photolyase) [1,2] is a DNArepair enzyme. It binds to UV-dam- 
aged DNA containing pyrimidine dimers and, upon absorbing a near-UV photon {300 to 500 nm), breaks the cyclobutane 
ring joining the two pyrimtdines of the dimer. DNA photolyase is an enzyme that requires two choromophore-cofactors 
55 for its activity: a reduced FADH2 and either 5.10-methenyltetrahydrofolate (5,10-MTFH) or an oxidized 8-hydroxy- 
5-deazaflavin (8-HDF) derivative {F420). The folate or deazaflavin chromophore appears to functk)n as an antenna, 
while the FADH2 chromophore is thought to be responsible for electron transfer. On the basis of sequence similarities 
[3] DNA photolyases can be grouped into two classes. The second class contains enzymes from Myxococcus xanthus, 



EP 1 033 405 A2 



Consensus pattern: G-x-H-D.x(2)-W-x.E.R.x4U\flVI>F^34C4UVMhFHPn-M.N. 

[ 1] Sancar G.B.. Sancar A. Trends Biochem. ScL 12:259-261 (1987) 
[ 2] Joms M.S. Blofactors 2:207-211 (1990) 

[ 3) Yasui A. EKer /.P.M.. Yasuhi« S.. Yajima H.. Kobayashi T.. TaKao M.. Oitewa A EMBO J. 13:6143^151 

[045q (DNAphotolyase2) DNA photolyases class 1 signatures 

Deoxynbodipyrimidno photolyase {EC 4. 1 .99 3) f DNA Dhotoiva<a.^ ii 91 .c o r>M a 

aged DNA contain^g ,y,n.Z <^Ler^^^^:^:T^'^ Z^^'^^S^Z irSl?.? 

tane nr>g pining the two pyrimidines of the dimer. DNA photolyase fe aS ««LnV«l,!? ^ 

cofactors for its activity a reduced FADH9 anH owh.., e Vr^*^^ ! V requires two choromophore- 

droxy-5KJea2aflavin Shdr ^S^te^^^ (5.10-MTFH) or an oxidiz^8-hy. 

tenna. while the FASiTfo^V^^J^^ chromophore appears to functton as an an- 

sin^laritiespiDNAph^l^ly^^tegLpSo^^ T^'^'- °" 

and Grarn^x^itive tecterl. the halophL aTc^ae^^^^^ 

bind either- 5. 10-MTHF (E coli funqf etc r^THDF rTnnJ?^^ k^I^^'""^ fungi and plants. Class 1 enzymes 
crvptochro.es 1 (CRYll and2rcS2S^'a;:«u 1^^^^ --«opsis 
press,on. There are a number of consented sequence regtens in allS^ clals l lS a 1^.^ "sW-mduced gene ex- 
O^emilnal part. Two of these regions were selected as s^l^uTe pattZ P'^otolyases. especially in the 

P>4571 Consensus pattern: T-G-x-P-(LIVM](2)-D-A-x-M-fRA1-x-[UVMl- - - 

Consensus pattern: [DNJ-R-x-FI-(UVMJ(2).x-|STA](2)-F4UVMFAhx-K-x-L-x(2.3). W-JKRO]- 

( 1J Sancar G.B., Sancar A. Trends Biochem. Sci. 12:259-261 (1987) 
I 2J Joms M.S. Biofactors 2:207-211(1990) 

[3, Yasui A.. EKer ^PM. Yasuhi« S.. Ya]^ H.. Kobayashi T.. Takao M.. OiKawa A. EMBO J. 13:6143-6151 
[ 4] Lin Ahmad M.. Cashmore AR. Plant J, 10:893-902(1996). 

[0458] 138. (DNA^LA) 

DNA polymerase family A signature 

r^ele^^^^^^ ^^^.^T" ^"^^^^"^ ^^^^^ -^''-^^ ^^A. They 

DNA polymerase family A. The pofymerases that belong to this famf^ ^1^21^ ^ ^ ^ ^"'"^'^ ^ 

- Escherichia coli and various other bacterial polymerase I (gene polA) 

- Thermus aquaticus Taq polymerase. 

■ Bacteriophage spOl polymerase. 

- Bacteriophage sp02 polymerase. 

- Bacteriophage T5 polymerase. 

- Bacteriophage T7 polymerase. 

• Mycobacteriophage L5 polymerase. 

■ Y®^st mitochondrial polymerase gamma (gene MIPl ). 

•^UT^':^^.^:^::^^^^^ ^ -se^ed region. Kno«. as 

substrates; it contains a conserve^ r.^osL h J k «<> bind deoxynucleolide triphosphate 
aconse.edVsine,a,sopT.:Smrrcar^^^^^^^^ 

was used as a signaiure for this family of DNA polymerases Pyndoxal phosphate. This consented region 

[0460] Consensus PattemR.x(2)-IGSAVl-K-x(3)-IUVMFYKAGQl-x(2)-Y-x(2)-[GSI.x/3» II ivmai c 

to belong to this class detected by the pattern ALL li="Sj-x(3)4LIVMAJ Sequences known 

[ 1J Delarue M.. Poch O.. Todro N.. Moras D.. Argos P Protein Eng. 3:461-467(1990). 
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( 2J Ito J.» Braithwaite D.K. Nucleic Acids Res. 19:4045-4057(1991). 
[ 3) Braithwaite D.K.. Ito J. Nucieic Acids Res. 21:767-802(1993). 

[0461] 139. DNA^poLviiaLC 
s DNA polymerase (viral) C-termlnal domain 
Number of members: 128 
[0462] 140. (DNAJopotsoll) 
DNA topoisomerase II signature 

DNA topoisomerase I (EC 5.99.1 .2) [1 .2.3.4,E1] is one of the two types of enzyme that catalyze the interconversion 
10 of topological DNA isomers. Type II topoisomerases are ATP-dependent and act by passing a DNA segment through 
a transient double-strand break. Topoisomerase It is found in phages, archaebacteria. prokaryotes. eukaryotes. and 
in African Swine Fever virus (ASF). In bacteriophage T4 topoisomerase 11 consists of three subunits (the product of 
genes 39, 52 and 60). In prokaryotes and in archaebacteria the enzyme, known as DNA gyrase, consists of two subunits 
(genes gyrA and gyrB (E2]). In some bacteria, a second type II topoisomerase has been kJentried; it is known as 
IS topoisomerase IV and is required for chromosome segregation, it also consists of two subunits (genes parC and parE). 
In eukaryotes, type tl topoisomerase is a homodimer. 

[0463] There are many regkxis of sequence homotogy between the different subtypes of topoisomerase II. The 
relation between the different subunits is shown in the following representatkxi: 

20 

< About-1400*residues > 



25 [ Protein 39-* ][ — Protein 52 — ] Phage T4 

( gyrB ♦ ][ gyrA J Prokaryote 11 

Archaebacteria 

[ parE * ][ parD ] Prokaryote IV 

{ ♦ J Eukaryoteand 

ASF 

Position of the pattern. 



30 



35 



[0464] As a signature pattern for this family of proteins, a region that contains a highly consen/ed pentapeptkie was 
selected. The pattern is kxated in gyrB, in parE. and in protein 39 of phage T4 topoisomerase. 
40 [0465] Consensus pattem[LIVMA]-x-E-G-[DN]-S-A-x-[STAG) Sequences known to belong to this class detected by 
the pattern ALL. 

[ 1) Sternglanz R. Curr Opin. Cell Bbl. 1:533-535(1990). 
[ 2] Bjornsti M.-A. Curr. Opin. Stmct. Biol. 1:99-103(1991). 
4S 1 3J Shamna A.. Mondragon A. Curr. Opin. Struct. Biol. 5:39-47(1995). 

1 4] Roca J. Trends Bkxhem. Set. 20:156-160(1995). 

[0466] 141 (DSPc) Tyrosine specific protein phosphatases signature and profiles 

Tyrosine specific protein phosphatases (EC 3.1.3.48 ) (PTPase) [1 to 5J are enzymes that catalyze the removal of a 
so phosphate group attached to a tyrosine residue. These enzymes are very important in the control of cell growth, pro- 
liferation, differentiatran and transformation. Multiple forms of PTPase have been characterized and can be classified 
into two categories: soluble PTPases and transmembrane receptor proteins that contain PTPase domain(s). The cur- 
rently known PTPases are listed betow: Soluble PTPases. - PTPN1 (PTP-IB). - PTPN2 (T-cell PTPase; TC-PTP). - 
PTPN3 (HI) and PTPN4 (MEG), enzymes that contain an N-terminal band 4.1- like domain (see <PDOC00566>) and 
ss could act at junctions between the membrane and cytoskeleton. - PTPN5 (STEP). - PTPN6 (PTP-1C; HCP; SHP) and 
PTPN11 (PTP-2C; SH-PTP3: Syp). enzymes which contain two copies of the SH2 domain at its N-terminal extremity. 
The Drosophila protein corkscrew (gene csw) also bekxigs to this subgroup. - PTPN7 (LC-PTP; Hematopoietic protein- 
tyrosine phosphatase; HePTP). - PTPN8 (702-PEP). - PTPN9 (MEG2). - PTPN12 (PTP-G1; PTP-P19). - Yeast PTPl. 



oc 
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" oil . t^/o^ed in the ubiquitin-mediated pnMein degradation pathway. - Fission yeast dvdI and 

yc^H^-Autogra^calrfomK^nuclearpolyhedrosisvlru^ 

MAP lonase phosphatase-1; MKP-l); which dephosphorylates MAP Idnase on both ^hM83 Z Tyr-lS ^UsS 
(PAC-1), a nuclear enzyme that dephosphorylates MAP kinases ERK1 and ERK2 on both Thr ai^TV^a^Sn^ 
DUSP3 (VHR). - DUSP4 (HVH2). - DUSP5 (HVH3). - DUSP6 (Pys. 1 ; MKP-3). - DUSPyTpy^^ Y^TmGS 
a PTPa^ that dephosphorylates MAP kinase FUS3. - Yeast YVH1. - V^ina vinis m fi^lX a dl^J^^ 
phosphatase Receptor PTPases. Structural^, all known receptor PTPases. are n«de up of a ,^te^XeS 
cellular d«na,n. foltowed by a transmembrane region and a CHerminalcatalytic cyloplasrSte O^^^tZ^^e- 
ITZ PT-^f fibj^l-'^t'ntype III (FN-III) repeats, immunoglobulin^ike^ins. MmZ^o^^Z 

J^-^T,J^ . . '° » inactive but seems toTflect substrate 

spec jcrty of the first. In these domains, the catalytic cysteine is generally consented but sJS o^'^^^SS 

2 0 0 2LeuK«:y.e antigen related (LAB, 3 8 of2'f^<^rDl^rg?o"S2^^ 

(LRP) 0 00 0 2PTP-beta 016 00 IPTP-gamma 0 1 loipTPHtella 0 >7 0 0 2^^'1^?Jo q gP^''^^^^^^^^ 

01 2PTP-mu 1401 2PrP-zeta 0110 2PTPase domains consist of about 300 Zo ad2rThere IIZ^^^^ 

resKJues .n rts .mmediate vc.nity have also been shown to be important A signature pattern for PTP^ S^^ w2 

toTe P?? suw^r " ^^a^speciflcity PTPases'^d .he Z^Z 

[0467] Consensus pattern: lLIVMF]-H-C-x(2)-G-x(3HSTCHSTAGPJ.x-(UVMFY] (C is the active site residuej- 

1 1) Fischer E.H.. Charbonneau H., Tonks N.K. Science 253:401-406(1991) 
[ 2] Chaibonneau H.. Tonks N.K. Annu. Rev. Cell Bol. 8:463^93(1 992). 
( 3) Trowbridge I.S. J. Biol. Chem. 266:23517-23520(1991). 
[ 4) Tonks N.K.. Charbonneau H. Trends Biochem. Sci. 14:497-500(1989) 
( 5] Hunter T. Cell 58:101 3-10l6n9a9V 

[0468] 142. (DUF10) Uncharacterized protein family UPF0076 signature 

The foBowing uncharacterized proteins have been shown [1] to share regions of similarities- - Goat anfioen Uicii^ » 
human hornolog and the rat correspondhg protet, which is known as perchloric ackl Se pr^,^ Ipl^) PSpV'i', 
may .nh.brt an .nrtjatKx, stage of celHree protein synthesis. - Mouse heat-responsive protein H^?P1 2 Yeas! cJrS 
mosome V hypothetK^I protein YER057c. - Yeast chromosome IX hypothetica^rotein YILOSic SenoIabdiS^ 
Z'^st^^^^y^ ' f """"^''^ ^yPometical p^tein ycdK. - EscheScoJ^ p.t 
T j!^ : , ^ hypothetcal protein yjgF and HI071 9. the corresponding Haemophilus influenzae orot^n 

r\ ^^'""^ ^ypo.hetk.l'JIiein'yabJ. ^^C^^flS^re 

Sm? c 1 ; Py'°" hypothetfcal protein HP0944. - Lactococcus lactis aldR - Myxococcus 

(0«9) 0<m«,suspM,m: |PAHASTPVl.FHSACVFWU\M=Yl.x(2HGSAKRh«HLMV»|.x|5,8HLIVMhE-|Mlh 

[ 1) Balroch A. Unpublished observations (1995) 

ifjgf """^ ^•^^ ^ ' "^9 Y.-M.. Suzuki I.. Munoz S.. Natori Y J. Biol. Chem 270 r^n^n.^n.. 

[0470] 143. (DUF3)Domain of Unknown Function 3 

Domain apparently occurring exclusively in eubacteria. Unknown functton 

[0471] 144. (DUF6) Integral membrane protein 

Z'SJs": meXrejr' "^"^'"^ '""^'^ °' -"•^^'^ <>• P-^*- -tain 

[0473] 145. (DUF7) Integral membrane protein 

[0474] This famiV includes many hypothetical membrane proteins of unknown function. Swiss:P14502 has been 
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implicated in resistance to ethidium bromide. 

[0475] 146. (DapB) Dihydrodipicolinate reductase signature 

Dihydrodipicolinate reductase (EC 1.3.1.26 ^ catalyzes the second step in the biosynthesis of diaminoptmelic acid and 
lysine, the NAD or NADP-dependent reduction of 2.3<Jihydrod^>icorinate into 2,3,4,5-tetrahydrodlplcolinate. This en- 
s zyme is present in bacteria (gene dapB) and higher plants. As a signature pattern the best consented region in this 
enzyme was selected. It is located in the central section and is part of the substrate-binding region [1 J. 
[0476] Consensus paUem: E-[IV]-x-E-x-H-x(3)-K-x-D-x-P-S-G-T-A- 
[0477] 1 1] Scaprn G.. Blanchard J.S.. Sacchettini J.C. Biochemistry 34:3502-351 2(1 995). 
[0478] 147. DedA family 

10 [0479] This family combHies the DedA related proteins and YIAN/YGIK family. Members of this family are not func- 
tionally characterised. These proteins contain multiple predicted transmembrane regions. 
[0480] 148. DegT/DnrJ/EryCI/StrS family 

[0481] The members of this family exhibit some characteristics of the sensor protein of two-component signal trans- 
duction systems, however none of the members show any sequence similarity to these protein kinases. The members 
IS of this family do have the typical helix-tum-helbc motif of DNA binding proteins. 

[0482] [1] Stutzman-Engwall KJ, Otten SL, Hutchinson CR, J Bacteriol 1992;174:144-154. 
[0483] 149. (Desaturase) Fatty acid desatu rases signatures 

Fatty acid desaturases (EC 1.14.99.-) are enzymes that catalyze the insertion of a double bond at the delta position 
of fatty acids. There seems to be two distinct families of fatty acid desaturases which do not seem to be evolutionary 
20 related. Family 1 is composed of: - Stearoyl-CoA desaturase (SCD) (EC 1.14.99.5) [1]. SCD is a key regulatory enzyme 
of unsaturated fatly acid biosynthesis. SCD introduces a cis double bond at the delta(9) position of fatty acyl-CoA's 
such as palmitoleoyi- and oleoyl-CoA. SCD is a membrane-bound enzyme that is thought to function as a part of a 
multienzyme complex in the endoplasmic reticulum of vertebrates and fungi. As a signature pattern for this family a 
conserved region in the C-termlnal part of these enzymes was selected, this region is rich in histidine residues and in 
aromatic residues. Family 2 is composed of: - Plants stearoyl-acyl-carrier-protein desaturase (EC 1.14.99.6) [2], these 
enzymes catalyze the introduction of a double bond at the delta(9) positfon of steraoyl-ACP to produce oleoyl-ACP. 
This enzyme is responsible for the conversion of saturated fatty acids to unsaturated fatty ackis in the synthesis of 
vegetable oils. - Cyanobacteria desA [3] an enzyme that can introduce a second cis double bond at the delta(12) 
positbn of fatty acid bound to membranes glyceroliplds. DesA is involved in chilling tolerance; the phase transition 
30 temperature of lipids of cellular membranes being dependent on the degree of unsaturatlon of fatty acids of the mem- 
brane lipids. As a signature pattern for this family a conserved region in the C-terminal part of these enzymes was 
selected. 

[0484] Consensus pattern: G-E-x-(FYl-H-N-[FYl-H-H-x-F-P-x-D-Y- 
Consensus pattern: [ST]-[SA]-x(3)-[QR]-[LI]-x{5,6)-D-Y-x(2)-[LIVMFYW]-{LlVMl-[DE]- 
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[ 1] Kaestner K.H., NtamblJ.M.. Kelly TJ. Jr.. Lane M.a J. Biol. Chem. 264:14755-14761(1989). 
[ 2) Shanklln J., Somerville C.R. Proc. Natl. Acad. Sci. U.S.A 88:2510-2514(1991). 
{ 3J Wada H., Gombos Z., Murata N. Nature 347:200-203(1990). 

^0 [0485] 150. Dihydroorotase signatures 

Dihydroorotase (EC 3.5.2.3 ) (DHOase) catalyzes the third step in the de novo biosynthesis of pyrimidlne, the conversion 
of ureidosucclnic acid (N-carbamoyl-L-aspartate) into dihydroorotate. Dihydroorotase binds a zinc ion which is required 
for its catalytk: activity [1]. In bacteria. DHOase is a dimer of identk:al chains of about 400 amino^kJ residues (gene 
pyrC). In higher eukaryotes, DHOase is part of a large multi-functional protein known as 'rudimentary' in Drosophtta 
and CAD in mammals and which catalyzes the first three steps of pyrimidine biosynthesis [2J. The DHOase domain is 
located in the central part of this polyprotein. In yeasts, DHOase is encoded by a monof unctlonal protein (gene URA4). 
However, a defective DHOase domain (3) Is found In a multifunctional protein (gene URA2)that catalyzes the first two 
steps of pyrimidlne biosynthesis. The comparison of DHOase sequences from various sources shows [4] that there 
are two highly conserved regions. The first located in the N^erminal extremity contains two histidine residues suggested 
[3] to be involved in binding the zinc ion. The second is found in the C-terminal part. Signature patterns for both regtons 
have been developed. Allantoinase (EC 3.5.2.5 ) is the enzyme that hydrolyzes allantoin intoallantoate. In yeast (gene 
DAL1 ) [5), It is the first enzyme in the allanto indegradation pathway; In amphibians (6) and fish it catalyzes the second 
step in the degradatbn of uric acid. The sequence of allantoinase is evolutionary related to that of DHOases. 

[0486] Consensus pattern: D-|LIVMFYWSAPJ-H-(LIVAl-H-[LIVFHRN]-x-[PGANF] (The two H's are probable zinc 
55 ligandsj- 

Consensus pattern: (GA)-(STl-D-x-A-P-H-x(4)-K- 

( 1) Brown D.C., Collins K.D. J. Biol. Chem. 266:1597-1604(1991). 
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( 2J Davidson J.N.. Chen K.C.. Jamison R.S., Musmanno LA.. Kem C.B. BfoEssays 15:157-164(1993) 
[ 3J Soudel J.-L, Nagy M., Le Gouar M.. Lacrouto F.. Potier S. Gene 79:59-70(1989). 
[ 4) Guyonvarch A.. Nguyen-Juilleret M.. Hubert J.-C.. Ucroute R Mol. Gen. Genet. 212:134-141(1988) 
[ 5] Buckholz R.G., Cooper TG. Yeast 7:913-923(1991 ). 

[ 6] Hayashi S.. Jain S . Chu R, Alvares K., Xu B.. Erfurth R, Usuda N., Rao M.S.. Reddy S.K , Noguchi T Reddv 
J.K., Yeldandi A. Y J. Bbl. Chem. 269: 12269-12276(1994). . jr 

[0487] 1 51 . dnaJ domains signatures and profile 

[0488] The prokaryotic heat shock protein dnaJ interacts with the chaperonehsp70-likednaK protein [1 J Stmcturalty 
the dnaJ protein consists of an N- temiinal conserved ctomain (called 'J' domain) of about 70 amino acids a glycine^ 
rich region ('G' domain') of about 30 residues, a central domain contaniing four repeats of a CXXCXGXG rnotif (*CRR* 
domain) and a C-lerminal region of 120 to 170 residues. Such a structure is shown in the following schematic repre- 
sentation: 

H +-H H ^ ^ ^ J N-terminal 1 1 

Gly.R 1 1 CXXCXGXG | C^terminal | -i +.+ ^ + ^ 

+ 

[0489] It has been shown [2] that the 'J' domain as well as the 'CRR domain are also found in other prokaryotic and 
eukaryotic proteins which are listed below. r j w 

a) Proteins containbig both a 'J* and a 'CRR* domain: 

- Yeast protein f^ASSA'DJI which seems to be involved in mitochOTdrial protein import. 

- Yeast protein I^DJl , involved in mitochondrial biogenesis and protein fokJing. 

- Yeast protein SCJl . invoh^ed in protein sorting. 

- Yeast protein XDJl. 

- Plants dnaJ homobgs (from ieek and cucumber). 
Human HDJ2. a dnaJ homotog of unknown function. 

- Yeast hypothetical protein YNL077w. 

a) Proteins containing a U'domain without a 'CRR domain: 

- Rhizobium fredii nolC, a protein involved in cultivar-specific nodulatton of soybean. 

- Escherichia coli cbpA (3). a protein that binds cun/ed DNA 

- Yeast protein SEC63^PL1 . Important for protein assembly into the endoplasmic reticulum and the nucleus 

- Yeast protein SIS1 . required for nuclear migratk)n during mitosis. 
Yeast protein CAJ1 . 

- Yeast hypothetical protein YFR041 c. 

- Yeast hypothetfcal protein YIR004w. 

- Yeast hypothetical protein YJL1 62c. 

- Plasmodium falciparum ring-infected erythrocyte surface antigen (RES A). RESA, whose function is not known 
IS associated with the membrane skeleton of newly invaded erythrocytes. 

Human HDJl. 

- Human HSJ1, a neuronal protein. 
Drosophila cysteine-string protein (csp). 

[0490] A signature pattern tor the 'J* domain was developed, based on consen/ed positions in the C-termlnal half of 
this domain. A pattern for the 'CRR' domain, based on the first two copies of that motif was also developed A profile 
tor the J domain was also developed. 

[0491] Consensus pattern: (FYl-x(2).[LIVMAI-x(3)4FYWHNTl.[DENQSA]-x.L-x4DN>x(3)-fKR]-x(2)-fFYn- 
Consensus pattem: C- |DEGSTHKR].x-C-x-G.x-fGK]-[AGSDM].x(2).[GSNKR]-x(4.6)-C-x(2.3)-C-x-GiG- 

[IJ Cyr D.M., Langer T. Douglas M.G. Trends Biochem. Sci. 19:176-181(1994). 

[2] Bork R, Sander C, Valencia A.. Bukau B. Trends Biochem. Sci. 17:129-129(1992). 

13] Ueguchi C, Kaneda M., Yamada H.. Mizuno T Proc. Natl. Acad. Sci. U.S.A. 91:1054-1058(1994) 
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[0492] 152. 
[0493] 153. Dwarf in 

[0494] This famtiy known as the dwarfins also Includes the drosophlla protein MAD. The N4ermlnus of MAD can 
bind to DNA [2]. 

[0495] [1] YIngling JM. Das P, Savage C, Zhang M, Padgetl RW, Wang XF. Proc Natl Acad Sci U S A 1996;93; 
8940-8944. [2] Kim J, Johnson K, Chen HJ, Carroll S, Laughon A. Nature 1997;388:304-308. 
[0496] 154. Dynein light chain type 1 signature 

Dynein Is a multlsubunit microtubule-dependent motor enzyme that acts as the force generating protein of eukaryotic 
cilia and flagella. The cytoplasmic isoform of dynein acts as a motor for the intracellular retrograde motility of vesicles 
and organelles along microtubules. Dynein is composed of a number of ATP-binding large subunits, intermediate size 
subun its and small subunits. Among the small subunits» there Is a family (1 ,2J of highly consented proteins which consist 
of: - Chlamydomonas reinhardtii flagellar outer arm dynein 8 Kd and 11 Kd light chains. - Higher eukaryotes cytoplasmic 
dynein light dnam 1 .-Yeast cytoplasmic dynein light chain 1 (gene DYN2 or SLC1 ). - Caenorhabditis elegans hypothet- 
ical dynein fight chains Ml 8.2 and T26A5. 9.These proteins are have from 89 to 1 20 amino acids. As a signature pattern. 
A highly conserved region was selected. 
Consensus pattern: H-x-l-x-G-[KR]-x-F-[GA]-S-x-V-IST]-(HY]-E - 

[ 1] King S.M.. Patel-KIng R.S. J, Biol. Chem. 270:11445-11452(1995). 

[ 2] Dick T, Ray K.. Salz H.K.. Chia W. Mol. Cell. Bbl. 16:1966-1977(1996). 

[0497] 155. dUTPase 

[0498] dUTPase hydrolyzes dUTP to dUMP and pyrophosphate. 

[0499] [1] Cedergren-Zeppezauer ES, Larsson G, Nyman PO, Dauter Z, Wilson KS, Nature 1992;355:740-743. [2] 
Mol CD, Harris JM, Mcintosh EM, Tainer JA. Structure 1996;4:1077-1092. 

[0500] 156. (dCMP cyt deam) Cytidme and deoxycytidylate deaminases zinc-binding region signature 
Cytldlne deaminase (EC 3.5.4.5 ) (cytidine aminohydrolase) catalyzes the hydrolysis of cytidine into uridine and am- 
monia while deoxycytidylatedeaminase (EC 3.5.4.12) (dCMP deaminase) hydrolyzes dCMP intodUMP. Both enzymes 
are known to bind zinc and to require it for their catalytic activity [1 ,2]. These two enzymes do not share any sequence 
similarity with the exception of a region that contains three consented histldine and cysteine residues whrch are thought 
to be involved in the binding of the catalytic zincion. Such a region is also found in other proteins [3,4J: - Yeast cytosine 
deaminase (EC 3.5.4.1 ) (gene FCY1) whteh transforms cytosine into uracil. - Mammalian apolrpoprotein B mRNA 
editing protein, responsible for the postranscriptional editing of a CAA codon into a UAA (stop) codon in the APOB 
mRNA. - Riboflavin biosynthesis protein ribG, which converts 2,5-diamino^- (ribosylamino)-4(3H)-pyrimidinone 5- 
phosphate into 5-amlno-6-(ribosylamino)-2,4(1H,3H)-pyrimldinedione 5'-phosphate. - Bacillus cereus blasticklin-S 
deaminase (EC 3.5.4.23) . whk:h catalyzes the deamination of the cytosine moiety of the antibiotk» blastfcidin S, cy- 
tomycin and acetylblasticidin S. - Bacillus subtilis protein comEB. This protein is required for the binding and uptake 
of transforming DNA. - Bacillus subtilis hypothetical protein yaaJ. - Escherichia coll hypothetical protein yfhC. - Yeast 
hypothetical protein YJL035c. A signature pattern for this zinc-binding region was derived. 

[0501] Consensus pattern: [CHHAGV]-E-x(2)-[LI VMFGAT]-[LIVM]-x(1 7,33)-P-C-x(2.8)-C-x(3)-[U VMJ [The C*s and 
H are zinc ligands 

( 1) Yang C, Cartow D., Wolfenden R, Short S.A. Bkx:hemistry 31:4168-4174(1992). 

[ 2) Moore J.T, Silversmith R.E.. Maley 6.F., Maley F. J. Biol. Chem. 268:2288-2291 (1 993). 

[ 3] Reizer J., Buskirk S., Bairoch A., Reizer A., Saier M.H. Jr. Protein Sci. 3:853-856(1994). 

1 4J Bhattacharya S., Navaratnam N., Morrison J.R., Scott J., Taylow W.R. Trends Biochem. Set 19:105-106(1994). 

[0502] 157. Dehydrins signatures 

A number of proteins are produced by plants that experience water-stress. \Afater-stress takes place when the water 
available to a plant falls below a critical level. The plant hormone abscisic acid (ABA) appears to modulate the response 
of plant to water-stress. Proteins that are expressed during water-stress are called dehydrins [1.2] or LEA group 2 
proteins (3). The proteins that belong to this family are listed betow. 

- Arabidopsis thaliana XERO 1 . XERO 2 (LTI30). RAB18, ERDIO (LTI45) ERD14 and COR47. 
Barley dehydrins B8, B9. B1 7. and B1 8. 

- Cotton LEA protein D-1 1 . 

Craterostigma plantagineum dessication-related proteins A and B. 

- Maize dehydrin M3 (RAB-17). 

- Pea dehydrins DHN1 , DHN2, and DHN3. 
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Radish LEA protein. 

- Rice proteins RAB 16B. 16C, 16D, RAB21, and RAB25, 

- Tomato TAS14. 

- Wheat dehydrin RAB 1 5 and co!d-shock protein cor410, cs66 and csl20. 

[0503] Dehydrins share a number of structural features. One of the most notable features is the presence, in their 
central region, of a continuous run of five to nine serines followed by a cluster of charged residues. Such a region has 
been found in all known dehydrins so far with the exception of pea dehydrins. A second conserved feature is the 
presence of two copies of alysine-rich octapeptide; the first copy is k)cated just after the cluster of charged residues 
that follows the poly-serine region and the second copy is found at the C-terminal extremity. Signature patterns for 
both regions were derived. 

[0504] Consensus pattern: S(5HDEhx-lDEI-G-x(1 .2)-G-x(0.1)-(KRl(4 
Consensus pattern: [KRHLIMJ-K-[DEhK-[UMJ-P-G- 

[1 J Close T.J.. Kortt A.A., Chandler P.M. Plant Mol. BloL 13:95-108(1989). 
(2) Robertson M.. Chandler PM. Plant Mol Biol. 19:1031-1044(1992). 

[3] Dure L III. Crouch M., Harada J., Ho T.-H. D.. Mundy J., Quatrano R., Thomas T. Sung Z.R Plant Mol Bk>l 
12:475-486(1989). 

[0505] 158. (deoR) Bacterial regulatory proteins. deoR family signature 

The many bacterial transcription regulation proteins which bind DNA through a helix-tum-helix' motif can be classified 
into subfamilies on the basis of sequence similarities. One of these subfamilies groups the foltowing proteins(1,2]: - 
accR, the Agrobacterium tumefaciens piasmki pTiC58 repressor of opine catabolism and conjugal transfer. - agaR, 
the Escherichia coB aga operon putative repressor. - deoR, the Escherichia coli deoxyribose operon repressor. - fucR! 
the Escherichia coli L-fucose operon activator. - gatR, the Escherichia coH galactitol operon repressor. - gIpR, the 
Escherichia coli glycerol-3-phosphate regulon repressor. - gutR (or sriR), the Escherichia coli glucitol operon repressor. 

- iolR, from Bacillus subtilis. - lacR, the streptococci lactose phosphotransferase system repressor. - spolllD, the Bacillus 
subtilis transcriptbn regulator of the sigK gene. - yfjR. an Escherk^iia coll hypothetteal protein. - ygbl, an Escherichia 
coli hypothetrcal protein. -yihW, an Escherichia cofi hypothetical protein, -yjfa an Escherichia coli hypothetfcal protein. 

- yjhJ, an Escherichia coli hypothetfcal protein. The •helix-tum-helix' DNA-binding motif of these proteins is located in 
the N-terminal part of the sequence. The pattern used to detect these proteins starts fourteen residues before the HTH 
motif and ends one residue after it. 

[0506J Consensus partem: R-x(3).[LIVMl-x{3)-[U VMl.x(1 6, 1 7).[STA]-x(2)-T4UVMA]. [RHHKRNA]-D-{LI VMPJ- 

[ 1) von Bodman S.. Hayman G.T. Farrand S.K. Proc. Natl. Acad. Sci. U.S.A. 89:643-647(1992). 
[ 2] Bairoch A. Unpublished observatnns (1993). 

[0507] 159. dsmi 
Double-stranded RNA binding motrf 

[1 ] Burd CG. Dreyf uss G; Medline: 94310455. Conserved structures and diversity of functions of RNA-binding proteins 
Science 1 994;265:61 5-621 . 

[0508] Sequences gathered for seed by HMM Jterative ^training Putative motif shared by proteins that bind to dsRN A. 
At least some DSRM proteins seem to bind to specific RNA targets. Exemplified by Staufen, which is involved in 
localization of at least five different mRNAs in the early Drosophila embryo. Also by interferon-induced protein kinase 
in humans, which is part of the cellular response to dsRNA. 
[0509] N umber of members: 1 1 6 
[0510] 160. Dynamin family signature 

Dynamin [1,2] is a mfcrotubule-associated force-producing protein of 100 Kd which is involved in the production of 
mfcrotubule bundles and which is able to bind and hydrolyze GTP Dynamin is structurally related to the folk)wing 
proteins: - Drosophila shibire protein (gene shi) [3]. Shibire is. very probably, the Drosophila cognate of mammafian 
dynamin. It seems to provide the motor for vesicular transport during endocytosis. - Yeast vacuolar sorting protein 
VPS1 (or SP015) [4). a protein which couW also be involved in mrcrotubule-associated motility. - Yeast protein MGM1 
[5], which is required for mitochondrial genome maintenance. - Yeast protein DNM1. which is involved in endocytosis. 

- Interferon induced Mx proteins [6.7]. Interferon alpha or beta induce the synthesis of a family of closely related proteins. 
Most of these proteins are known to confer resistance to influenza viruses and/or rhabdoviruses on transfected mam- 
malian cell in culture. The three motifs found in all GTP-binding proteins are kx:ated in the N4erminal part of these 
proteins. The signature pattern that was devetoped for these proteins is based on a highly consented regk>n downstream 
of the ATP/GTP-binding motif W (P-kx5p) (see <PDOC00017 >).- 
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[0511] Consensus pattern: L-P-[RK1-G-[STNHGNHLIVM]'V-T-R- 

( 1] Vallee R.B., Shpetner H.S. Annu. Rev. Biochem. 59:909-932(1990). 

[ 2) Obar R.A.. Collins C.A., Hanrvnarback J. A.. Shpetner H.S., Vallee R.B. Nature 347:256-261(1990). 
5 ( 3] van der Bliek A., MeyerowHz E.M. Nature 351 :41 1 -41 4(1 991 ). 

[ 4J Rpthman J.H.. Raymond C.K.. Gilbert T., QHara RJ.. Stevens T.H. Cell 61:1063-1074(1990) . 

[ 5] Jones B.A., Fangman W.L Genes Dev. 6:380-389(1992). 

[ 6) Amherter H., Meier E. New Biol. 2:851-857(1990). 

[ 71 Staeheli R. Pitossi F.. Pavlovk; J. Trends Cell Biol. 3:268-272(1993). 

10 

[0512] 161. (dynamin_2) Dynanrtin central region 

[0513] This region lies between the GTPase domain, see dynamin . and the pleckstrin homology (PH) domain. 
[051 4] 1 62. E 1 -E2 ATPases phosphorylation site 

E1-E2 ATPases (also known as P-type) are catbn transport ATPases whk^h form an aspartyl phosphate intermediate 
IS in the course of ATP hydrolysis. ATPases which belong to this family are listed bebw [1 .2.3). - Fungal and plant plasma 

membrane (H+) ATPases [reviewed in 4]. - Vertebrate (Na+. K+) ATPases (sodium pump) (reviewed in 5,6]. -Gastric 

(K+, H+) ATPases (proton pump). - Calcium (Ca++) ATPases (calcium pump) from the sarcoplasmk; reticulum (SR). j-;.;; 

the endoplasmic reticulum (ER) and the plasma membrane. - Copper (Cu++) ATPases (copper pump) which are In- i'^r 

volved in two human genetk: disorders: Menkes syndrome and Wilson disease [7]. - Bacterial potassium (K+) ATPases. [ 
20 - Bacterial cadmium efflux {C6+^) ATPases [reviewed in 8], - Bacterial magnesium (Mg++) ATPases. - A probable ji; : 

cation ATPase from Leishmania. - fixl, a probable cation ATPase from Rhizobium melitoti, Involved in nitrogen fixatbn. 

The regk>n around the phosphorylated aspartate residue is perfectly conserved in all these ATPases and can be used 

as a signature pattern. 

[0515] Consensus pattern: D-K-T-G-T-[LI1-[TI] (D is phosphorylated] 

2S 

1 1) Green N.M.. McLennan D.H. Biochem. Soc. Trans. 17:819-822(1989). 
[ 2] Green N.M. Bkx:hem. Soc. Trans. 17:970-972(1989). 
[ 3) Pagan M.J.; Saier M.H. Jr. J. Mol. Evol. 38:57-99(1994). 
1 4] Serrano R. Biochim. Biophys. Acta 947:1-28(1988). 
30 1 5] Fambrough D M. Trends Neurosci. 1 1 :325-328(1 988). 

[ 6) Sweadner K.J. Biochim. Biophys. Acta 988:185-220(1989). 
[ 7] Bull PC, Cox D.W. Trends Genet. 10:246-251(1994). 

[ 8] Silver S., Nucifora G.. Chu L, Misra T.K. Trends Biochem. Sci. 14:76-80(1989). 

35 [0516] 163 E1_N 

El Protein, N terminal domain 
Number of members: 90 

[0517] 164. (E1_dehydrog) Dehydrogenase El component 

[0516] This family uses thiamine pyrophosphate as a cof actor. This family includes pyruvate dehydrogenase, 2-ox- 
40 oglutarate dehydrogenase and 2-oxoisovalerate dehydrogenase. 
[0519] 165. (ECH) Enoyl-CoA hydratase/lsomerase signature 

Enoyl-CoA hydratase (EC 4.2.1.17 ) (ECH) [1] and 3-2trans-enoyl-CoA jsomerase(EC 5.338 ) (ECl) [2] are two en- 
zymes involved in fatty acid metabolism. ECH catalyzes the hydratation of 2-trans-enoyl-CoA into 3-hydroxyacyl-CoA 
and ECl shifts the 3- double bond of the intemnediates of unsaturated fatty acid oxidation to the 2-trans position. Most 

^ eukaryotfc cells have two fatty-ackJ beta-oxkJatton systems, one kx^aled in mitochondria and the other in peroxisomes. 
In mitochondria, ECH and ECl are separate yet structurally related monof unctbnal enzymes. Peroxisomes contain a 
tr functional enzyme [3] consisting of an N -terminal domain that bears both ECH and ECl activity, and a C-terminal 
domain responsible for 3-hydroxyacyl-CoA dehydrogenase (HCDH) activity. In Escherichia coll (gene fadB) and Pseu- 
domonas tragi (gene faoA), ECH and ECl are also part of a multifunctional enzyme whch contains both a HCDH and 

so a3-hydroxybutyryl-CoA epimerase domain [4]. A number of other proteins have been found to be evolutionary related 
to the ECH/ECI enzymes or domains: - 3-hydroxbutyryl-coa dehydratase (EC 4.2.1.55 ) (crotonase), a bacterial enzyme 
involved in the butyrate/butanol-producing pathway - Naphthoate synthase (EC 4.1.3.36 ) (DHNA synthetase) (gene 
menB) [5], a bacterial enzyme involved in the biosynthesis of menaqulnone (vitamin K2). DHNA synthetase converts 
O-succlnyl-benzoyl-CoA (OSB-CoA) to 1 ,4-dihydroxy- 2-naphthoic acid (DHNA). - 4"Chk)robenzoate dehalogenase 

55 (EC 3.8.1.6 ) [6), a Pseudomonas enzyme which catalyzes the converskm of 4-chlorobenzoate-CoA to 4-hydroxyben- 
zoate-CoA. - A Rhodobacter capsulatus protein of unknown function (ORF257) [7J. - Bacillus subtilis putative polyketide 
biosynthesis proteins pksH and pksl. - Escherk:hia coll carnitine racemase (gene caiD) [8). - Escherichia coli hypothet- 
ical protein ygfG. - Yeast hypothetical protein YDR036c.As a signature pattern for these enzymes, a consented regkMi 
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richin glycine and hydrophobic residues was selected 

[ 1 1 Minami-lshfi N.. Taketani S.. Osumi T.. Hashimoto T Eur J. Biochem. 185:73.78(1 989) 
1 2J Mueller^ewen a, Stoffel W. Biol. Chem. Hoppe-Seyler 372:613^24(1991) 
1 3] Palosaari RM., Hiltunen J.K. J. Biol. Chem. 265:2446-2449(1 990) 
[ 4J Nakahigashi K., Inokuchi H. Nucleic Acids Res. 18:4937-4937(1990) 
[ 5] Driscoll J.a. Taber H.W. J. Bacteriol. 174:5063-5071(1992) 

[ 6J Babbitt PC. Kenyon G.L, Matin B.M., Charest K. Sylvestre M. Schollen J D nn^nn k- w i r, ^ 
Dunaway.MarianoD. Biochemistry 31:5594.5604(1992). ^^''^ ' ^9 K-H., Liang R-H.. 

( 7J Beckman D.L. Kranz R.G. Gene 107:171-172(1991) 

( B) Eichler K.. Bourgis F.. Buchet A.. Kleber H.-P.. Mandrand-Berthelot M-A. MoL MicrobioL 13:775-786(1994). 
[0521] 166. (EF1BD) Elongation factor 1 betaA>etaydelta chain signatures 

p?T^"°" '^'J f""^* responsible lor the GXP^Jependent binding of aminoacy^tRNAs to the ribos 
? 1 ^. : S."*"*^^ °* ^ **ich binds GTP Li ami^S^kRN^ ml 1^^" 

chain thatprobably plays a role in anchoring the complex to other cellular cornpone^a^ Stell^lt 

Consensus pattern: [IV]-0-S-x-D-{UVMJ-x-A-tFWMJ-INQ]-K-{UVM]- 

[ 1] Riis B.. Rattan I.S., Clark B.FC. Merrick W,C. Trends Biochem. Sci. 15-420-«24f 19901 

10526] 1 69. (E FP) Elongation factor P signature 

f!^?u- ^^^t^"*"^ P^"^'": K-x-[AVhx(4)-G-x(2HLIVJ-x-V-P-x(2HUVI-x(2)-G- 

[^^2 Consensus pattern: l--R-x(2)-T-(GSDNQ).x4GSMLIVMF]-x(0.l )-fDENKAG1-x-K-IKRNFO<5t a i 
[0530J Consensus pattern: E-[LIVMHNV]-[SCV]-|QEJ-T-D-F-V-(SAHKRNJ f'<'^NEQShA-L- 

[ 1) Bubunenko M.G., Kireeva M.L. Gudkov A T Biochimie 74-419-425(1 992) 
1 2] Kostrzewa M. , Zetsche K. Plant Mol. Biol. 23:67-76(1 993) 

( 3J Xin H.. Woriax V.L.. Burkhart W.A.. Spremulli LL. J Biol Chem i>70:i7243-17?4Q( iQQ^) 
t0531J 171. (EMP24_GP25L) emp24/gp25Up24 family 

[0532J Members of this family are implicated in bringing cargo forward from tha FR anrt hi~i;„„ . 

their cytoplasmic domains. Number of members: 30 '° ^'^^"^ 
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[0533] Paccaud JP, Thomas DY, Bergeron JJ, Nilsson T, J Cell Biol 1998;140:751-765. 

172. ENV _pofyprotein 

ENV polyproteir) (coat polyprotein) 

Number of members: 224 

[0534] 173. (ERG4_ERG24) Ergosterol biosynthesis ERG4/ERG24 family signatures 

Two fungal enzymes involved in ergosterol biosynthesis and which act by reducing double bonds in precursors of 
ergosterol have been shown to be evolutbnary related (1). These are C-14 sterd reductase (gene ER624 in budding 
yeast and ergS in Neurospora Crassa) and C-24(28) sterol reductase (gene ER64 In budding yeast and stsi in fission 
yeast). Their sequences are also highly related to that of chicken tamin B receptor, which is thought to anchor the 
lamina to the inner nuclear membrane. These proteins are highly hydrophobic and seem to contain seven or eight 
transmembrane regions. As signature pattems, two conserved regions were selected. The first one is apparently lo- 
cated in a loop between the fourth and fifth transmembrane regions and the second is in the C-termir)al section. 
[0535] Consensus pattern: G-x(2)-[UVM]-[YHI-D-x-IFYW]-x-G-x(2)*L-N-P-R- 
Consensus pattern: [LIVM](2)*H-R-x(2)-R-D-x(3)-C-x(2)-K-Y-G- 

[ 1] Lai M.K, Bard M.» Pierson C.A., Alexander J. F., Goebl M.. Carter G.T., Kirsch D.R Gene 140:41-49(1994). 
[0536] 1 74. (ERM) Ezrin/radlxinyimoesin family 

[0537] This family of proteins contain a band 4. 1 domain (Band_41 ) . at their amino terminus. This family represents 
the rest of these proteins. 

[0538] [1] Yonemura S. Hirao M. Doi Y, Takahashi N. Kondo T. Tsukita S. J Cell Biol 1998;140:885-895. 
[0539] 175. ER lumen protein retaining receptor signatures 

Proteins that reskJe in the lumen of the endoplasmic retteulum (ER) contain aC-terminal tetrapeptide (generally K-D- 
E-L or H-D-E-L) that seives as a signal for their retrieval (retrograde transport) from subsequent compartments of the 
secretory pathway. The signal is recognized by a receptor molecule that is believed to cycle between the cis side of 
the Golgi apparatus and the ER [1].This protein is known as the ER lumen protein retaining receptor or also as the 
'KDEL receptor*. It has been characterized in a variety of species, including fungi (gene ERD2), plants. Plasmodium. 
Drosophila arKJ mammals. In mammals two highly related fonns of the receptor are known. Structurally, the receptor 
is a protein of about 220 residues that seems to contain seven transmembrane regions [2]. The N-terminal part (3 
residues) is oriented toward the lumen while the C-tenminal tail (about 12 residues) is cytoplasmic. There are three 
lumenal and three cytoplasmic loops. Two signature patterns for these receptors were developed. The first pattem 
corresponds to the C-terminal half of the first cytoplasmic loop as well as nrrast of the second transmembrane domain. 
The second pattern is a perfectly consented decapeptide that corresponds to the central part of the fifth transmembrane 
domain. 

[0540] Consensus pattem: G-l-S-x-[KRl-x-Q-x-L-IFYhx-[LiV](2)-F-x(2)-R-Y- 
Consensus pattern: L-E-ISA]-V-A-I-ILM]-P-Q-L- 

[ 1] Pelham H.R.B. Curr. Oprn. Cell Biol. 3:585-591(1991). 

[ 2] Townsley RM., Wilson D.W., Pelham H.R.B. EMBO J. 12:2821-2829(1993). 

[0541] 176. (ETF^beta) Electron transfer flavoprotein beta-subunit signature 

The electron transfer flavoprotein (ETF) [1 ,2] serves as a specific electron acceptor for various mitochondrial dehydro- 
genases. ETF transfers electrons to the main respiratory chain via ETF-ubiquinone oxidoreductase. ETF is an het- 
erodimer that consist of an alpha and a beta subunit and which bind one molecule of FAD per dimer. A similar system 
also exists in some bacteria. The beta subunit of ETF is a protein of about 28 Kd whk:h is structurally related to the 
bacterial nitrogen fixatkxi protein f ixA which could play a role in a redox process and feed electrons to ferredoxin. Other 
related proteins are: - Escherfchia coli hypothetical protein ydiQ. - Escherichia coli hypothetical protein ygcR.As a 
signature pattem for these proteins, a conserved region which is kx:ated in the central section was selected. 
[0542] Consensus pattem: |l VA)-x-[KR)-x(2)-IDEl- [GD]-(GDE]-x(1 .2)-[EQJ-x-[LI V]- x(4)-P-x-ILI VMJ(2)-[TACJ- 

( 1) Finocchiaro G.. Ikeda Y. Ito M., Tanaka K. Prog. Clin. Biol. Res. 321:637-652(1990). 
1 2J Tsai M.H., Saier M.H. Jr. Res. Microbiol. 146:397-404(1995). 

[0543] 177. Endonuclease 111 signatures 

Escherichia coli endonuclease III (EC 4.2.99. 18 ) (gene nth) [1] Is a DNA repair enzyme that acts both as a DNA N- 
glycosylase. removing oxidized pyrimldines from DNA, and as an apurintc/apyrimkJInic (AP) endonuclease. introducing 
a single-strand nick at the site from which the damaged base was removed. Endonuclease 111 is an iron-sulfur protein 
that binds a single 4Fe-4Scluster. The 4Fe-4S cluster does not seem to be important for catalytk: activity, but is probably 
involved in the proper positioning of the enzyme along the DNA strand [2].Endonuclease 111 is evolutionary related to 
the following proteins: - Fission yeast endonuclease III homolog (gene nthi) (3]. - Escherichia coli and related protein 
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DNA repairprotein mutY. which is an adenine glycosylase. MutY Is a larger protein (350 aminoacids) than endonuclease 
III (21 1 aminoacids). - Micrococcus luteus ultraviolet N-glycosylase/AP lyase which initiates repairat cis^yn pyriirodine 
dimers. - ORF10 m plasnnid pFVI of the thermophSic archaebacteria Methanobacterium thermoformicicum 141 Restric- 
tion methyl^ m.MthTI. which is encoded by this plasmld. generates S^nethylcytosine which is subject to deamination 
resulting in G-T mismatches. This protein could correct these mismatches. - Yeast hypothetical protein YALOiSc 
Fission yeast hypothetical protein SpAC26A3.02. - Caenorhabditis elegans hypothetical protein R10E4.5. - Methane- 

^ mm _ • • k A ... 

is bound by lour cysteines which are all located in 
a 1 7arn.no ac«J reg«n at the C-tenninal end of endonuclease III. A simPar region Is also present in the central section 
of mutY and m the Otermmus of ORFIOand of the Micrococcus UV endonuclease. The 4Fe^ cluster region does 
not exist m YALOISc. Two signature patterns for these proteins were developed: the first corresponds to the core of 
the iron-sulfur binding domain, the second corresponds to the best consen/ed region in the catalytic core of these 
en2yme5. 

[0544] Consensus pattern: C^x(3HKRS].P-[KRAGL)-C-x(2)-C.x(5)-C [The lour C's are 4Fe-4S figandsl- 

1 1) KuoC.-R, McRee D.. FisherCL. CHandley S.R. CunnighamRR. Tainer JA Science 25a-434wMon 992^ 
[ 2] Thomson A. J. Curr. Bbl. 3: 1 73-1 74(1 993). .-fo-f-KK/v i ^^^h 

I 3] Roldan-Arjona T. Anselmino C, LIndahl T Nucleic Acids. Res. 3307-3312(1996). 

1 4J Noelling J., van Eeden F.J.M.. Eggen RLL. de Vbs W.M. Nucleic Acids Res. 20:6501-6507(1992). 

[0545] 178. (Epimerase) NAD dependent epimerase/dehydratase family 

[0546] This family of proteins utilize NAD as a cofactor. The proteins in this family use nucleotide^ugar substrates 
for a variety of chemical reactions. ^ ^uu^itams 

fSIisiA ^'^^'^ Wesenberg G. Chapeau MC. Frey PA. Holden HM. Biochemistry 1997.36: 

[0548] 179. Exonuclease 

KLIS m*^'^ ^ "^"^"^ exonuclease proteins, such as ribonuclease T and the epsilon subunrt of DNA 

[0550] [1] Koonin EV. Deutscher MP, Nucleic Acids Res 1993 21 -2521 -2522 
[0551] 180. ENTH 
ENTH domain 

[05521 11) Kay BK. Yamabhai M. Wendland B. Emr SD; Medline: 99156083. Identification of a novel domain shared 
t>y putative components of the endocytw and cytoskeletal machinery. Protein Sci 1 999 8 435.438 
[05S31 The ENTH (Epsin N-lerminal homology) domain is found in proteins involved in endocytc»is and cytoskeletal 
machinery. The f unctwn of the ENTH domain is unknown. cyiosKeietai 
[0554] Number of members: 29 

[0555] 181. (elF-IA) Eukaryotic initiatkm factor lAsignature 

Eukaryotic translation inttiation factor 1 A (elF-IA) [1] (fom>erly known aseiF.4C) is a protein that seems to be required 
or mfflcimal rate of protein biosynthesis. It enhances ribosome dissociation into subunits and stabilizesthe bind^na of 
the mmator Met-tRNA to 40S ribosomal subunits.elF-1 A is a hydrophilic protein of about 15 to 17 Kd IrcteeSr^ 

pI^etiTls s^r^ ^ ^"'"^'^ ^ ^ ^ " •he central sectk>n of these 

[0556] Consensus pattern: [IMJ-x.G-x-(GSHKRHJ-x(4HCL)-x-D.G-x(2)-R-x(2HRHhl- x-G 
[0557] [ 1 ] Wei C.-L. Kainuma M.. Hershey J.W.B. J. Bk)l. Chem. 270:2278B-22794fl995) 
[0558] 182. (elF-5A) Eukaryotc mitiatnn factor 5A hypusine signature 

fnhSr'^. ^ ^'^ ^^'''^^^ ^"'■'*°) '^-21 is a small protein whose precise role in the 

r u T " ^PP^'" '° °' tt'e «fst peptide bond. elF-5Aseems 

to be the only eukaryoticprotein to contain an hypusine residue. Hypusine isderivedfrom lysineby the post^ianslational 

toleTunc^c^^rrprT "'^""''"'^ '° epsiton^amino group of lysine. The hypuLe J^up is essential 
to the function of elF-5A.A hypusine-containng protein has been found in archaebacteria such as Sulfolobus acido- 
caldarius or Methanococcus jannaschii; this protein is highlysimilar to elF-5A and could play a similar role in protein 
b«synthesis. The signature developed for elF-5A is centered around the hypusine reskiue 
[0559] Consensus pattern: lPT]-G-K-H-G-x-A-K [The first K is modified to hypusine] 

[ 1] Park M.H.. Wolif EC. Folk J.E. Biofactors 4:95-104(1993). 

Snilt-McBride Z. Kang H.A.. Hershey J.W.B. Mol. Cell. Biol. 11:3105-3114 
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[0560] 183. (efhand) S-IOO/lCaBP type calcium binding protein signature 

S-100 are snriafl dinneric acidic calcium and zinc-binding proteins [1] abundant in the brain. They have two different 
types of calcium-binding sites: a low affinity one with a special structure and a 'normal' EF-hand type high affinity site. 
The vitamin-D dependent intestinal calcium-binding proteins (ICaBP or calbindin 9 Kd) also belong to this family of 
proteins, but it does not form dimers. In the past years the sequences of many new members of this family have been 
determined (for reviews see [2,3.4]); in most cases the function of these proteins is not yet known, although it is be- 
coming clearthat they are involved in cell growth and differentiation, cell cycle regulation and metabolic control These 
proteins are: - 

Calcyclin (Prolactin receptor associated protein (PRA); clatropin; 2a9; 5B10; S100A6). - Calpactin I light chain (plO; 
p11; 42c; S100A10). - Catgranulin A (cystic fibrosis antigen (CFAg); MIF related protein 8 (MRP- 8); p8; S100A8). - 
Calgranulin B (MIF related protein 14 (MRP-14); p14; S100A9). - Calgranulin C. - Calgizzarin (S100C). • Placental 
calcium-binding protein (CAPL) (18a2; peL98; 42a; p9K; MTS1; metastatin; S100A4). - Protein S-100D (S100A5). - 
Protein S-100E (S100A3). - Protein S-100L (CAN19: S100A2). - Placental protein S-100P (S100E). - Psoriasin 
(S100A7). - Chemotactic cytokine CP-10 [5]. - Protein MRP-126 [6]. - Trichohyalin [7], This is a large Intermediate 
filament-associated protein that associates with keratin intermediate filaments (KIF); it contains a S- 100 type domain 
in its N-terminal extremity. A number of these proteins are known to bind calcium while others are not (pi Ofor example). 
Our EF-hand detecting pattern will fail to pick those proteins which have lost their cateiunrvbinding properties. A pattern 
was developed which unambiguously pfcks up proteins betonging to this family. This pattern spans the regkx) of the 
EF-hand high affinity site but makes no assumptions on the cateium-blnding properties of this site. 
[0561] Consensus pattern: (UVMFYW](2)-x(2)-[Uq-D-x(3)-[DNJ-x(3)-[DNSG)-(FY}-x- (ES]-lFYVCJ-x(2)-[LIVMFS]- 
[LIVMF] 

[ 1] Baudier J. (In) Calcium and Calcium Binding proteins. Gerday C, Bollis L. Giller R., Eds., ppl 02-11 3, Springer 
Verlag. Berlin. (1988). 

[ 2] Moncrief N.D., Kretslnger R.H., Goodman M. J. Mol. Evol. 30:522-562(1990). 
[3] Kligman D.. Hilt D.C. Trends Biochem. Sci. 13:437-443(1988). 

( 4] Schaefer B.W.. Wicki R., Engelkamp D., Mattel M.-G., Heiznriann C.W. Genomics 25:638-643(1995). 
[ 5) Lackmann M., Cornish C.J., Simpson R.J., Moritz RL. Geczy C.L J. Biol. Chem. 267:7499-7504(1992). 
1 6J Nakano T, Graf T. Oncogene 7:527-534(1992). 

[ 7] Lee S.-C., Kim l.-G., Marekov L.N., O'Keefe E.J., Parry D.A.D.. Steinert PM., J. Bk)l. Chem. 268:12164-12176 
(1993). 

EF-hand calcium-binding domain 

M^y cateium-binding proteins belong to the same evolutionary family and share a type of calcium-binding domain 
known as the EF-hand [1 to 5]. This type of domain consists of a twelve residue loop flanked on both side by a twelve 
residue alpha-helical domain. In an EF-hand loop the calcium ion Is coordinated in a pentagonal bipyramidal configu- 
ration. The six residues involved in the binding are in positions 1, 3, 5, 7, 9 and 12; these residues are denoted by X, 
Y. 2, -Y, -X and -Z. The invariant Glu or Asp at position 12 provides two oxygens for liganding Ca (bidentate ligand). 
Listed below are the proteins whk*) are known to contain EF-hand regions. For each type of protein the total number 
of EF-hand regions known or supposed to exist is indicated between parenthesis. This number does not include regions 
which clearly have tost their cateium-binding properties, or the atypk:al tow-affinity site (whtoh spans thirteen residues) 
found in the S-100/ 
ICaBP family of proteins [6]. 

- Aequorin and Renilla luciferin binding protein (LBP) (Ca=3). 

- Alpha actinin (Ca=2). - Calbindin (Ca=4). 

Cafcineurin B subunit (proteh phosphatase 2B regulatory subunit) (Ca=4). 
Catolum-binding protein from Streptomyces erythraeus (Ca=3?). 
Calcium-binding protein from Schistosoma mansoni (Ca=2?). 

- Calcium-binding proteins TCBP-23 and TCBP-25 from Tetrahymena thermophila (Ca=4?). - Cateium-dependent 
protein kinases (CDPK) from plants (Ca=:4). 

Catoium vector protein from amphoxius (Ca=2). 
Cateyphosin (thyroid protein p24) (Ca=4?). 
Calmodulin (Ca=4, except In yeast where Ca=3). 
Calpain small and large chains (Ca=2). - Calretlnin (Ca=6). 
Calcyclin (prolactin receptor associated protein) (Ca=2). 
Caltractin (centrin) (Ca=2 or 4). 

Cell Division Control protein 31 (gene CDC31) from yeast (Ca=2?). 



oc 
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- Diacylg»ycerol kinase (EC 2.7.1 .107) (DGK) (Ca=2). 

. F/^-Jependent gVcero|.3.phosphate dehydrogenase (EC 1.1.99.5) from mammals (Ca=1). . Fimbrin (plastin) 

- Flagellar calcium-bindlng protein ( 1 f 8) from Trypanosoma cnizi (Ca=1 or 2). 

- Guanylate cyclase activating protein (GCAP) (Ca=3) 

- MIF related proteins 8 (MRP-8 or CFAG) and 1 4 (MRP- 1 4) (Ca=2). 

- Myosin regulatory light chains {Ca=1). - Oncomodulin (Ca=:2) 

■ i?pT^!*" ^^^^"^^"^ membrane protein BM^) (SPARC) and proteins that contains an •osteonectin' domain 
(QR1. ^natrix gVooprotein SCI) (see the entry <PDOC00535» (Ca=1). . Pan^bumlns alpha and beta (S 

- Placental cateium-bindmg protein {18a2) (newe growth factor induced protein 42a) (p9k) (Ca=2) 

- Recovenns (vtsinin, hippocalcin. neurocatein, S-moduIin) (Ca=2 to 3). 
* Retk:ukx3lb«n (Ca=4). - S-l 00 protein, alpha and beta chains (Ca=2). 

- Sarcoplasmic calcium-binding protein (SCPs) (Ca=2 to 3). 

- Sea urchin proteins Spec 1 {Ca=4), Spec 2 (Ca=4?), Lps-1 (Ca=8) 

- Squidulin (optic lobe calcium-binding protein) from squid (Ca=4) 

- Troponins C; from skeletal muscle (Ca=4). from cardiac muscle (Ca=3). from arthropods and molluscs (Ca=2). 

rr ^ ""'""1' °' ^"^"^'^ '^'^^ P^"^'"* P"*-"P ^^-^^ regions, but these studies were 
made a few years ago when not so many different families of calcium-binding proteins were known. Thereto e aTew 
pattern was developed which takes into account an published sequences. This ^ttem b^Lls Zo^^Lte EF^^ 
k»p as wen as the first residue which follows the toop and which seem to alwayi be hydrophobkx 

" ^M^"^ °•''■'°'^^^^''-^'^^'^^NSTG^(DNQGHRKl-^GP}-tUV^«CHDENOSTAGC]-x(2)-(DEI- 

- Note: positions 1 (X), 3 (Y) and 12 (-Z) are the most conseroed 

■ 1!°!!' ^ ^"""^ ^ *" « """*er of except«os to this -rule" has 
gradualhr Bicreased and therefore the pattern should include all the different residues whic*. have been s^wn to 
exist m this position m functional Ca-binding sites. 

■ Sails^ ^^"^"^ '^^^ ""^ ^''^^'^ *" ^« P«^«"« '""«'P'« EF-hand 

LlS^^S^g^rt^^^^^ ^^'^^^^^^^J f 2J X-'^^S- B.H. cold Spr.g Harbor Symp. 

[ 31 Moncrief N.D.. Kretsinger R.H.. Goodman M. J. Mol. Evol. 30:522-562(1990) 

f 4) Nakayama S.. Moncrief N.D., Kretsinger R.H. J. Mol. Evol. 34:416-448(1992) 

{ 5] Heizmann C.W.. Hunziker W. Trends Biochem. Scl. 16:98-103(1991). 

[ 6) Kligman D.. Hilt D C. Trends Biochem. Sci, 13:437-443(1988). 

[ 7] Strynadka N.C.J.. James M.N.G. Annu. Rev. Biochem. 58:951-98(1989). 

[ 8) Haiech J., Sallantin J. Biochimie 67:555-560(1985) 

[ 9^^auvaux S.. Beguin P.. Aubert J.-R. Bhat K.M.. Gow LA.. Wood TM.. Bairoch A. Biochem. J. 265:261-265 
[10] Bairoch A., Cox J.A. FEBS Lett. 269:454-456(1990). 
[0562] 184. Fnolase signature 

Enolase (EC 4.2.1. ii ) js a glycolytk; enzyme that catalyzes the dehydration of2-phosDho^-alvceraia to nh«=„h~. 
Ts oTKi','" H " ' '^""'^ '"^snesium both'for catalysis'aS'^lzlTrd^e^^^^^ 

^m^ alLT" f """Tr "'"^'"''^^ "'-^ tissue species: 
zymes. alpha present ,n most fssues. beta in muscles and gamma found only in nen^ous tissues Tau-c JtX 
of the major lens proteins in some fish. repUles and birds, has been shown [2] to be evolutionary r^a.ed .o enoiase 

[05631 Consensus pattern: fLlV](3)-K-x-N-ai-G-[ST]-[LIV]-[ST|-[DE]-[STA] 
[ 1J Lebioda L. Stec 8., Brewer J.M. J. Biol. Chem. 264:3685-3693(1989) 
[ 2J Wislow G., Piattigorsky J. Science 236:1554-1556(1987). 
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[0564] 185. (F-actin_cap_A) F-actin capping protein alpha subunit signatures 

The F-actin capping protein binds in a calcium-independent manner to the fast growing ends of actin filaments {barbed 
end) thereby blocking the exchange of subunits at these ends. Unlike gelsolin and severin this protein does not sever 
actin filaments. The F-actin capping protein Is a heterodimer composed of two unrelated subunits: alpha and beta.The 
alpha subunit is a protein of about 268 to 288 amino acid residues whose sequence is well conserved n eukaiyotc 
species [1]. As signature patterns two highly conserved regkxis in the C-terminal section of the alpha subunit were 
selected. 

[0565] Consensus pattern: V-H-{FY](2)-E-E>G-N-V 
Consensus pattern: F-K-|AE]-L-R-R-x-L-P- 

[0566] [ 1) Cooper J.A.. Caldwell J.E., Gattemrieir D.J., Torres M. A., Amatruda J.F.. Casella J.F. Cell Motil Cytoskel- 
eton 18:204-214(1991). 
[0567] 1 86. F-box domain 

[0S68] [1] Bai C. Sen P. Hofmann K, Ma L, Goebl M. Harper JW, Elledge SJ. Cell 1996;86:263-274, [2J Skowyra D, 
Craig KU Tyers M. Elledge SJ, Harper JW, Cell 1 997:91 :209-21 9. 
[0569] 187. F-protein 
Negative factor, (F Protein) or Net. 

[0570] [IJ AroW S, Franken P, Strub M-P, Hoh F. Benfchou S, Benarous R, Dumas C; Medline: 98035457, The crystal 
structure of HI V-1 Nef protein bound to the Fyn kinase SH3 domain suggests a role for this complex in altered T cell 
receptor signalling Stojcture 1 997;5: 1 361 -1 372. 

[0571] Nef protein accelerates virulent progression of AIDS by its interaction with cellular proteins involved in signal 
transductkx> and host cell activation. Nef has been shown to bind specifically to a subset of the Src kinase family. 
[0572] Number of members: 1013 
[0573] 188. (FAD_binding.2) 

Fumarate reductase / succinate dehydrogenase FAD-binding site In bacteria two distinct, membrane-bound, enzyme 
complexes are responsible for the interconversion of fumarate and succinate (EC 1.3.99.1): fumarate reductase (Frd) 
is used in anaerobe growth, and succinate dehydrogenase (Sdh) is used in aerobks growth. Both complexes consist 
of two main components: a membrane-extrinsic component composed of a FAD-binding flavoprotein and an iron-sulfur 
protein; and an hydrophobic component composed of a membrane anchor protein and^r a cytochrome B. 
[0574] In eukaryotes mitochondrial succinate dehydrogenase (ubquinone) (EC 1 .35.1) is an enzyme composed of 
two subunits: a FAD flavoprotein and and iron-sulfur protein. 

[0575] The flavoprotein subunit is a protein of about 60 to 70 Kd to which FAD is covalently bound to a histidine 
residue which is located in the N-terminal sectkxi of the protein [1 J. The sequence around that histidine is well conserved 
in Frd and Sdh from various bacterial and eukaryotk: species [2] and can be used as a signature pattern. 
[057q Consensus pattemR-[ST]-H-[STJ-x(2)-A-x-G-G [H is the FAD binding site] Sequences known to belong to this 
class detected by the pattern ALL. 

[ 1 ] Blaut M., Whittaker K., Nfeldovinos A., Ackrell B.A., Gunsalus R.R, Cecchini G. J. Biol. Chem. 264" 1 3599-1 3604 
(1989). 

[ 2] Birch-Machin M.A., Famsworth L. Ackrell B.A, Cochran B., Jackson S., Bindoff LA., Aitken A., Diamond A. 
G.. Tumbull D.M. J. BtoL Chem. 267:11553-11558(1992). 

[0577] 189. Fatty acid desaturases signatures (FA_desaturase) 

Fatty acid desaturases (EC 1 .14.99.-) are enzymes that catalyze the insertton of a double bond at the delta position 
of fatty acids. There seems to be two distinct families of fatty acid desaturases whk:h do not seem to be evoluttonary 
related. Family 1 is composed of: - Stearoyl-CoA desaturase (SCD) (EC 1.14.99.5^ ( 1 ]. SCD is a key regulatory enzyme 
of unsaturated fatty acid biosynthesis. SCD introduces a cis double bond at the delta(9) position of fatty acyl-CoA's 
such as palmitoleoyi- and oleoyl-CoA. SCD is a membrane-bound enzyme that is thought to function as a part of a 
multienzyme complex in the endoplasmic reticulum of vertebrates and fungi. As a signature pattern for this family a 
consen/ed region in the C-terminal part of these enzymes was selected, this region is rich in histidine residues and in 
aromatic residues. Family 2 is composed of: - Plants slearoyl-acyl-carrier-protebi desaturase (EC 1.14.99.6^ [2], these 
enzymes catalyze the introduction of a double bond at the delta(9) position of steraoyl-ACP to produce oleoyl-ACR 
This enzyme is responsible for the conversion of saturated fatty acids to unsaturated fatty acids In the synthesis of 
vegetable oils. - Cyanobacteria desA [3] an enzyme that can introduce a second cis double bond at the de}ta(12) 
positbn of fatty acid bound to membranes glycerolipids. DesA is involved in chilling tolerance; the phase transition 
temperature of lipids of cellular membranes being dependent on the degree of unsaturation of fatty acids of the mem- 
brane lipids. As a signature pattern for this family a consented region in the C-temriinal part of these enzymes was 
selected. 

[0578] Consensus pattern: G-E-x-[FYl-H-N-lFY)-H-H-x-F-P-x-D-Y- 
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Consensus pattern: (S■rHSA]-x{3HQR]-^LI^x{5,6)-D-Y-x(2HUVMFYWl-^UVM]^DE]- 

[ 1] Kaestner K.H., Ntambi J.M.. Kelly T.J. Jr.. Lane M.D. J. Biol. Chem. 264:14755-14761(1989) 
[ 2] Shanklin J., Somerville C.R. Proc. Natl. Acad. Sci. U.S.A. 88:2510-2514(1991) 
1 3J Wada H. . Gombos 2. , Murata N. Nature 347:200-203(1 990). 

[0579] 1 90. Fructose-1 -6-blsphosphatase active site (FBPase) 

Fructose-1 6-bisphosphatase (EC ai3ji) (FBPase) (1). a regulatory enzyme in glaconeogenesis. catalyzes the hv- 
drolys« of fmctose 1 .6.bisphosphate to fructose 6-phosphate. It Is involved in m4 dffler^ pSt^ys ^.d 

found ^ most organ«ms.Sedoheptulose-1.7-bisphosphatase (EC 3.1.3.37) (SBPase) 121 is an^nm^S^ dSJ 
hentlTT? " ^^T^^'^ "^^^ ^ the hydro^ioheptuiL' 1 7^is^^S^to s JS 

22?o fSS? • ^ "J^^ Calvin-s reductive pentose phosphate cyde. It is functional L lictuSj 
f IS^ In mammal«n FBPase. a lysine residue has been shown to be invoVed in the catelytic mechantem 
3]. The region a«>und th« resriue is highly conserved and can be used as a signature pattern for FBPase aJilsPar 
rt must be noted that in some bacteral FBPase sequences, the active site lysine is r^ laced by ^^"0 
Consensus pattern: lAGHRKJ-L-x{1.2)-(UVHFY]-E-x(2)-P-IUVMJ4GSA] [K^ is the artive site r^kJuT 

1 1) Benkovic S.J.. DeMaine M.M. Adv. Enzymol. 53:45-82(1982) 

! S ^t^^*^' ^"""S*^ N.M.. Potts S.. Dyer TA. Eur. J. Biochem. 205:1053-1059(1992) 

1 3J Ke H.. Thorpe CM.. Seaton B.A.. Lipscomb W.N.. Marcus F. J. MoL Biol. 212:513-539(1989). 

[0580] 1 91 . FGGY family of carbohydrate kinases signatures * 

It has been shown (1 J that four different type of carbohydrate Kinases seem to be evolutionary related These enzymes 

^7 l"iH^^:,o»S ^ ?vS^ • ^2) (gene gn^. - SyLrora^7EC 

mjQ) (gene g IpK). - Xylulokmase (EC 2.7.1.17) (gene xylB). - L-xylulose kinase (EC 2.7.1.53^ (gene lyxK) These 
enzymes are protem of from 480 to 520 amino acW residues. As consensus patterns f^Tlfifeiirof 
o^sen^ed reg«nswere selected, one in the central section, the other in the C-ternimal section 
[0581] Consensus pattern: [MFYGShx-IPST]-x(2)-K-{LIVMFYW]-x-W-[UVMFJ-x-[DEI<raTKRi- rENQH^ 

[^2 ' ^ • ''«"t«*'e^ J ' Saier M.H. Jr.. Reizer J. Mot. Mcrobrol. 5:1081-1089(1991) ^ 

cTna?! ^ PXBP-type peptidyl-prolyl cis-trans Isomerase signatures/profile (FKBP) 

!^.C.'^^■^^f^*'®'"^'°'^*^"^'""'*y''*"'*'"9P^otein. Invertebrates, fortheimmunosuppress^ Itexhibits 
^rn.!???' ^''""^ (EC 5JJJ) (PPiase or rotamase). PPIase is an e^r^ SaeteSS 

protern toldrng by catalyzing the cIs-trans isomerization of proline imidfc peptkJe bonds in oli^TeptSs m aTS 

FK506^'!^il°"^- °* ""^^ " • FKBP-12. whteh is cytosolic Td Sed iy bS 

FK506 and rapamycm - FKBP-1 3. whfch is membrane associated and inhibtted by both FK506 and rapamych - FKBP 

MB^' " '^""9^' • Mammalian hsp binding immunophlBn (HBI) (also called 

P59)^HBI rs a proleni wh.ch binds to hsp90 and contains two FKBP-like domains in ite 1^- teS sS tSSS 
wh«h seems to be functional. - The Ctemiinal part of the cell-surface protein mip f rom LegZra p^t^n ^^Z^ 
wnh macrophage infection by an unknown mechanism. - Escherichia coll slyD [8]. a pr^leTS, a Term^rFKB^ 
l^e^hr^o. A "^^"^ metal-bindhg domain. - Escherichia coTikpA'. - Escheri^,^ ?k B (^BF^r 
- Eschenchia cob sIpA. - Bacterial trigger factor (Tig). - Streptomyces hygroscopus and chrysomallus FK506 hinri^o 
protein. - Chlamydia trachomatis 27 Kd membrane protein. - N Jsseria'Lniniidis sLTcm ^p^^^^^ 
PPiases from Haemophilus influenzae (HI0754). Methanococcus jannaschii (^U0278 and MJOsJ) Pseu^^Js 
fluore^ens and Pseudor^nase aerug^osa. Two signature patterns for these proteins we^ de^^J Se^^ed 

Si rZ^"^ n-'VMCI-«-(Yn-x-[GVLI-x(1 .2HLFT].x(2)-G-x(3)-(DE]-fSTAEQK]-ISTAN)- 

Sx(2^j;5rF^-G''^ ''-'V'^'^W2HGA]-x(3.4).(UVMF]-x(2)-[UVMFHKhx(2)-G^ 

1 4) Fischer G.. Schmid F.X. Biochemistry 29:2205-2212(1990). 

1 5] Trandinh C.C.. Pao G.M.. Saier M.H. Jr. FASEB J. 6:3410-3420(1992). 
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[ 6) Galat A. Eur. J. Biochem. 216:689-707(1993). 

[ 7) Hacker J.. Fischer G. Mol. Microbiol. 10:445456(1993). 

[8] Wuelfing C, Lomardero J., Plueckthun A. J. Bbl. Chem. 269:2895-2901(1994). 

[0586] 193. MAPEG family (aka: FLAP/GST20C4S family signature) 
[0587] The following mammalian proteins are evolutionary related [1]: 

- Leukotrrene C4 synthase (EG 2.5.1 .37) (gene LTC4S), an enzyme that catalyzes the productwn of LTC4 from LTA4. 

- Mcrosomal glutathione S-transferase It (EC 2.5.1.18) (GST-II) (gene GST2). an enzyme that can also produces 
LTC4 f ron LTA4. 

- 5-lipoxygenase activating protein (gene FLAP)» a protein that seems to be required for the activatk>n of 5-lipoxy- 
genase. 

[0588] These are proteins of 1 50 to 1 60 residues that contain three transmembrane segments. As a signature pattern, 
a conserved region between the first ar>d second transmembrane domains was selected. 
[0589] Consensus pattern: G-x(3)-F-E-R-V-[FY]-x-A-{NQ]-x-N-C 

[0590] [1] Jakobsson P.-J., Mancini J. A., Ford-Hutchinson A W. J. Biol. Chem. 271:22203-22210(1996). 
[0591] 194. FMN-dependent alpha-hydroxy acid dehydrogenases active site (FMN_dh) 

A number of oxidoreductases that act on alpha-hydroxy acids and whrch are Ff^NKxmtaniing flavoproteins have been 
shown [1,2.3] to be structurally related; these enzymes are: - Lactate dehydrogenase (EC M 2.3 ), whk:h consists of 
a dehydrogenase domain and a heme-binding domain called cytochrome b2 and which catalyzes the conversion of 
lactate into pyruvate. - Glycolate oxkjase (EC 1.1.3.15 ) ((S)-2-hydroxy-acid oxidase), a peroxisomal enzyme that cat- 
alyzes the conversion of glycolate and oxygen to glyoxylate and hydrogen peroxide: - Long chain alpha-hydroxy acid 
oxidase from rat (EC 1.1.3.15) . a peroxisomal enzyme. - Lactate 2-monooxygenase (EC 1.13.12.4 ) (lactate oxidase) 
from Mycobacterium smegmatls. which catalyzes the conversion of lactate and oxygen to acetate, carbon dk)xtde and 
wat6r. - (S)-mandelate dehydrogenase from Pseudorronas putkia (gene mdlB), which catalyzes the reduction of (S)- 
mandelate to benzoylformate. The first step in the reaction mechanism of these enzymes is the abstractk)n of the 
proton from the alpha-carbon of the substrate producing a carbankxi which can subsequently attach to the N5 atom 
of FMN. A conserved histldine has,been shown (4) to be involved in the removal of the proton. The region around this 
active site residue is highly conserved and contains an arginine residue which Is Involved in substrate binding. 
[0592] Consensus pattern: S-N-H-G-[AG1-R-Q (H is the active site residue] [R is a substrate-binding reskJue]- 

I IJ Giegel DA. Williams C.H: Jr.. Massey V. J. Biol. Chem. 265:6626-6632(1990). 

[ 2] Tsou A.Y.. Ransom S.C.. Gerlt J.A.. Buechter D.D.. Babbitt PC. Kenyon G.L Biochemistry 29 9856-9862 
(1990). 

[ 3] Le K.H.D.. Lederer R J. Bk>l. Chem. 266:20877-20880(1991). 
[ 4J Lindqvist Y. Branden C.-i. J. Biol. Chem. 264:3624-3628(1989). 

[0593] 1 95. Flavin-binding monooxygenase-like (FMO-like) 

[0594] This family includes FMO proteins, cyclohexanone monooxygenase 

[0595] 196. (FPGS) 

Folylpolyglutamate synthase signatures (aka Murjigase) 

[0596] Folylpolyglutamate synthase (EC 6.3.2.17) (FPGS) (IJ is the enzyme of folate metabolism that catalyzes ATP- 
dependent addition of giutamate moieties to tetrahydrofolate. 

[0597] Its sequence is moderately consen/ed between prokaryotes (gene folC) and eukaryotes. We developed two 
signature patterns based on the conserved regions which are rich in glycine residues and could play a role in the 
catalytlcal activity and/or in substrate binding. 

[0598] Consensus pattern [LIVMFY]-x-[LlVM]-[STAG]-G-T-(NK]-G-K-x-[ST|-x(7)- ILIVM)(2)-x(3)-IGSK) Sequences 
known to belong to this class detected by the pattern ALL. 

[0599] Consensus pattemILIVMFY](2)-E-x-G-ILIVMl-[GAI-G-x(2)-D-x-(GSTJ-x-[LIVM](2) Sequences known to be- 
long to this class detected by the pattern ALL. 

[0600] [ 1] Shane B.. Garrow T, Brenner A.. Chen L, Choi YJ.. Hsu J.C.. Stover R Adv. Exp. Med. Biol 338 629-634 
(1993). 

[0601 ] 1 97. FYVE zinc finger 

[0602] The FYVE zinc finger is named after four proteins that it has been found in: Fabl, YOTB/ZK632. 1 2, Vfeicl, 
and EEAV The FYVE finger has been shown to bind two Zn-f+ ions [1]. The FYVE finger has eight potential zinc 
coordinating cysteine positk)ns. Many members of this family also include two histidines in a motif R+HHC+XCG . where 
+ represents a charged residue and X any residue. Members were included which do not conserve these histidine 
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residues but are clearly related 

[06031 m StenmarK H. Aasland R. Toh BH, D'Arrigo A. J Biol Chem 1996:271:24048-24054 [21 GauHier JM SI- 

monsen A. OAmgo A. Bremnes B. Stenmark H. Aasland R. Nature 1 998 394-432-433 

[0604] 198. F_actin_cap_B 

F-acfin capping protein beta subunil signature 

^1 F-actin capping protein binds in acatehimnndependent mannerto the fast growing ends of actin filaments 
(barbed end) thereby blocldng the exchange of subunits at these ends. Unlike gelsolin and seveZ Si^ nrotetT.^ 
noJseveract.fi^nU,.TheF-ac,incapp^gproteinlsaheterod^ercor^ 

n^lj!"^ '"T" ''^P'°'^'"°*^'^280arninoacidresidues whosesequence is wettconservedin eukan,^^ 
^ ^ ^^Z^ ^"""^ ^ hexapeptide In the N-tennlnal section of the beta subunrv^s seTi 

S ^rTTTn Sequences known to belong to this class detected by the iTem ^ 

S QQ^"''^i^-Kp^ ^"^^ • ^ ■ ^P^^ J A. Nature 344:352-354(1990) 
[06091 1 99. IsopemciDm N synthetase signatures (Fe_Asc_oxidored) 

IsopenicilUn N synthetase (IPNS) |1 .2) is a key enzyme in the bcsynthesis of penicillin and cephatosoorin In the nr^ 
ence of oxygen « removes iron and ascorbate. four hydrogen atoms from lim-sml,ZS^^e^SvaZ 
to form me azetidinorie and thiazolkJine rings of isopenfcillin. IPNS Is an enzyme of about S Siu- 
Two cysteines are consented in fungal and bacterial IPNS sequences; these may be inJolv^i^r^^nTInT" 
substrme-bindjrjg. Cephatosporium acremonium DAOCS/DaS [3] is a bif unctional enZetvcI^rn cS^^^ 
biosynthesis. The DAOCS domain, which is structurally related to IPNS catah^zes the^D ^,»nwn!^^^ 
etoxy-cephatosporin C - used as a substrate by DACS to form 6^^SS^TSZ^ 1'°,'^' 
possesses a monofunctional DAOCS enzyme (gene cefE) arepfomycesclavuligerus 
enzymes were derived, centered ^oun^Z JZS^^^^^ "^"^"'^ 
[061 0] Consensus pattern: lRK]-x-[STAJ-x(2)-S-x-C-Y-[SLh 
Consensus pattern: tUVMJ{2)-x-C-G-{STA]-x(2)-{STAGJ-x(2)-T-x-{DNG]- 

[1] Martin J.F. Trends Biotechnot. 5:306-308(1987). 

1 21 Chen G . Shiffman D.. Mevarech M.. Ahatonowitz Y. Trends Bfotechnol. 8:105-111(1990) 
( 31 Samson S. M. . Dotzlaf J.E.. Slisz M.L. Becker G.W.. van Frank RM Veal L E Yeh W K Miiior i n o 
S.W.. IngoliaT.D.Bio/Technotogy 5:1207-1214(1987) ^eail-E.. YehW.K., Miller J.R.Queener 

[ 4J Kovacevk: S., Weigel B.J., Tobin M.B.. Ingolia TD.. Miller J.R. J. Bacteriol. 171:754-760(1989). 
[0611] 200. Fbrillarin signature 

Fibrillarin [11 is a component of a nucleolar snrall nuclear ribonucleoproteinfSnRNP) oartble thonnht r..^~. . 

[0612J Consensus pattern: IGSTHUVMAP).V-Y-A-IIV].E.(FY]-[SAJ.x-R-x(2).R-[DEh 

[ 1J Aris J.R, BlobelG. Proc. Natl. Acad. Sci. U.S.A 88:931-935(1991) 

[ 2J Bandziulls R.J.. Swanson M.S.. Dreyfuss G. Genes Dev. 3-431-437(1989) 

1 3J Agha-Amiri K. J. Bacterbl. 176:2124-2127(1994). 

[061 3J 201 . Filanr)rn/ABP280 repeat 

L;^2'SL'yUrsTeS;e^ 1997;4:223-230. 

E 3 2ii?e";;t °' '"""^"^^^'^""^ ^'^ -^-^9 from GDP-Fucose to GlcNAc . an 

[061 7] (1 ] Breton C. Oriol R. Imberly A; Glycobiology 1 998;8:87-94 
[0618] 203 2Fe-2S ferredoxins. iron-sulfur binding region signature (fer2A) 



EP 1 033 405 A2 



cluster(s) and according to sequence similarities. One of these subgroups are the 2Fe-2S ferredoxins, which are pro- 
teins or domains of around one hundred amino acid residues that bind a single 2Fe-2S iron-sulfur cluster. The proteins 
that are known [2] to belong to this family are listed below. - Ferredoxin from photosynthetic organisms; namely plants 
and algae where It is located in the chloroplast or cyanelle; arid cyanobacterla. * Ferredoxin from archaebacteria of 

5 the Halobacterium genus. - Ferredoxin IV (gene pftA) and V (gene fdxD) from Rhodobacler capsulatus, - Ferredoxin 
In the toluene degradatbn operon (gene xylT) and naphthalene degradation operon (gene nahT) of Pseudomonas 
putida. - Hypothetical Escherichia coli protein yfaE. - The N4erminal domain of the bifunctional ferredoxin/lerredoxin 
reductase electron transfer component of the benzoate 1 ,2-dioxygenase complex (gene benC) from Acinetobacter 
calcoacetlcus, the toluene 4-monooxygenase complex (gene tmoF), the toluate 1 ,2-dioxygenase system (gene xylZ), 

10 and the xylene monooxygenase system (gene xylA) from Pseudomonas. - The N-terminal domain of phenol hydrox- 
ylase protein p5 (gene dmpP) from Pseudomonas Putida. - The N-terminal domain of methane monooxygenase com- 
ponent C (gene mmoC) from Methylococcus capsulatus . - The C-terminal domain of the vanillate degradation pathway 
protein vanB in a Pseudomonas species. - The N-terminal domain of bacterial f umarate reductase iron-sulfur protein 
(gene IrdB). - The N-terminal domain of CDP-6-deoxy-3.4-glucoseen reductase (gene ascD) from Yersinia pseudotu- 

75 berculosis. - The central domain of eukaryotic succinate dehydrogenase (ubiquinone) iron- sulfur protein. - The N- 
terminal domain of eukaryotic xanthine dehydrogenase. - The N-terminal domain of eukaryotic aldehyde oxkiase. In 
the 2Fe-2S ferredoxins, four cysteine residues bind the iron-sulfur cluster. Three of these cysteines are clustered 
together in the same region of the protein. Our signature pattern spans that iron-sulfur binding region. 
[0619] Consensus pattern: C-{CHC}-[GAJ-{C}-C-[GAST1-{CPDEKRHFYW}-C [The three C's are 2Fe-2S ligandsh 

20 1 1] Meyer J. Trends EcoL Evol. 3:222-226(1 988).[ 2] Harayama S., Polissi A., Reklk M. FEBS Lett. 285:85-88(1991). 
[0620] Adrenodoxin family, iron-sulfur binding region signature (f er2B) 

Ferredoxins (1) are a group of iron-sulfur proteins which mediate electron transfer in a wide variety of metabolic reac- 
tions. Ferredoxins can be divided into several subgroups depending upon the physiological nature of the iron sulfur 
cluster(s) and according to sequence similarities. One family of ferredoxins groups together the folbwing proteins that 

25 all bind a single 2Fe-2S iron-sulfur cluster: - Adrenodoxin (ADX) (adrenal ferredoxin), a vertebrate mitochondrial protein 
which transfers electrons from adrenodoxin reductase to cytochrome P450scc, whk:h is involved in cholesterol side 
chain cleavage. - Putidaredoxin (PTX), a Pseudomonas putkfa protein which transfers electrons from putkiaredoxin 
reductase to cytochrome P450-cam. which is involved in the oxldatk)n of camphor. - Terpredoxin [2J, a Pseudomonas 
protein which transfers electrons from terpredoxin reductase to cytochrome 

30 P450-terp, which is involved in the oxidation of alpha-terpineol. - Rhodocoxin [SJ, a Rhodococcus protein whk;h transfers 
electrons from rhodocoxin reductase to cytochrome CYP116 (thcB), whfch is involved in the degradatksn of thiocar- 
bamate herbfcides. - Escherchia coli ferredoxin (gene fdx) [4] whose exact function is not yet known. - Rhodobacter 
capsulatus ferredoxin VI [5). whfch may transfer electrons to a yet uncharacterized oxygenase. - Cauk)bacter crescen- 
tus ferredoxin (gene fdxB) [6J.ln these proteins, four cysteine reskiues bind the iron-sulfur cluster. Three of these 

55 cysteines are clustered together in the same region of the protein. Our signature pattern spans that iron-sulfur binding 
region. 

[0621] Consensus pattem: C-x(2)-[STAQ]-x-[STAMV]-C-[STA]-T-C-|HRJ [The three C's are 2Fe-2S ligands)- 
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1 1J Meyer J. Trends Ecol. Evol. 3:222-226(1988). 

[ 21 Peterson J.A., Lu J.-Y.. Geisselsoder J.. Graham-Lorence S., Carmona C, Witney F, Lorence M.C. J. Bk>l. 
Chem. 267:14193-14203(1992). 
[ 3] Nagy I., Schoofs G., Compernolle F, Proost P, Vanderleyden J.. De Mot R. J. Bacleriol. 177:676-687(1995). 
1 4) TaD.T. Vickery LE. J. Bbl. Chem. 267:11120-11125(1992). 

( 5] Naud I.. Vrncon M.. Garin J.. Gailtard J., Forest E,. Jouanneau Y Eur J. Bfc>chem. 222:933-939(1994). 
45 [ 6] Amemiya K EMBUGenbank: X51607. 

[0622] 204. 4Fe-4S ferredoxins, Iron-sulfur binding region signature (fer4) 

Ferredoxins [1] are a group of iron-sulfur proteins whk;h mediate electron transfer in a wide variety of metabolk: reac- 
tions. Ferredoxins can be divided into several subgroups depending upon the physiobgx:al nature of the iron-sulfur 
cluster(s). One of these subgroups are the 4Fe-4S ferredoxins, whk:h are found in bacteria and which are thus often 
referred as 'bacterial-type* ferredoxins. The structure of these proteins [2] consists of the duplication of a domain of 
twenty six amino acid residues; each of these domains contains four cysteine residues that bind to a 4Fe-4S center. 
A number of proteins have been found [3] that include one or more 4Fe-4Sbinding domains similar to those of bacterial- 
type ferredoxins. These proteins are listed bebw (references are only provided for recently determined sequences). - 
[0623] The iron-sulfur proteins of the succinate dehydrogenase and the f umarate reductase complexes (EC 1.3.99.1 V 
These enzyme complexes, which are components of the trk;art)oxylk: acid cycle, each contain three subunits: a flavo- 
protein, an iron-sulfur protein, and a b-type cytochrome. The iron- sulfur proteins contain three different iron-sulfur 
centers: a 2Fe-2S. a 3Fe-3S and a 4Fe-4S. - Escherichia coli anaerobic glycerol-3-phosphate dehydrogenase (EC 



so 
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This enzyme is composed of three subunits: A. B. and C. The C subunit seems to be an iron^uihir nr«.«in 
with two ferredoxin^ike domains in the N4eiminal part of the protein - Escherichia Si^erLt ZLTh? k . 

- Eschenchia coli fonnate hydrogenlyase. Two of the subunits of this oligomeric complex foenes hvcB I^^^h^» 

aenydtogenase (EC rzjj). This enzyme is used by the archaebacteria to grow on formate The beta chain of th« 

three 4Fe-4S centers, two' of'U^ i SS^nTe^^iSiigS ^^S^a? 

reouctase (EC 1 .8^.-) (4). Two of the subunits of this enzyme (genes asrA and asrC) seem to both bind twr> 4Faw1S 

centers. - A Ferredoxin^ike protein (gene fixX) from the nitrogen-fixation oenes locus ?rrk!.rnhi,^. 

and one from the Nif-region of Azotobacter species - The 9 Kd ^*^n?i^!^ . ''P^^^. 

P^C).™sp.telnccntainstwo.wpotentl^^^^^ 

protein which is predicted to carry two 4Fe-4S centers - An fftrr«rtnvm #r.«, « ^-^ners. i ne cnroropiast f rxB 

En.amobeaf.sto^ca-EscheS.^cc.i^eSp^^^^ 

radK:al ac^«at«,g enzymes family (see <PDOC0g834» and two potential 4Fe-4S centers ^stS^l r^ 

.dues «! the iron-sulfur region is sufficient todetect this class of 4Fe-4S binding proteins ^ 
[0624] Consensus pattern: C-x(2)-C-x(2)-C-x(3)-C^PEGI (The four C's are 4F^4S Bgandsh 



[ 1] Meyer J. Trends Ecol. Evol. 3:222-226(1988). 

[ 2] Otalta E, Ooi T. J. Mol. Evol. 26:257-267(1987). 

[ 3J Beinert H. FASEB J. 4:2483-2492(1990). 

[ 4J Huang C.J.. Barrett E.L. J. Bacteriol. 173:1544-1553(1991). 

[ 5J Knaff D.B. Trends Biochem. Sci. 13:460-461(1988). 



[0625] 205. NifH/frxC family signatures (fer4_NifH) 

Nttrogenase (ECiiaRI) HJ is the enzyme system responsible for biological nilrooen fixation NHrono„=..» v 

such as ferredoxin; the reduced protein then transfers electrons to comuonent 1 with fhl «™Lf. ^ ^ 
ATP. A number of proteins are known to be evolutk,nary reimTto n^f^pr^ 1. 
(or chlL) protein 13]. FrxC is encoded on the chteroplas'; genome of ^r^^;, s^ : e;^^^^^^^^^ TrSZ':!^ 

proteins bchL and bchX [4). These proteins are also likely to play a role in chloroohvll svnthesis Th^^Tll^l^rT 

consensus pattern: D-x-L-G-D-V-V-C-G-6-F-IAGhx-P [C binds the Iron-sulfur cente"] 
( 1J Pau R.N. Trends Biochem. Sci. 14:183-186(1989) 

1 2J Georgladis M.M.. Komiya H., Chakrabarti P.. Woo D.. Komuc J.J., Rees D C Science 257-165'*-iR«;QriQoo> 
3 Fojrta y lakahashi Y. Kohchi T. Ozeki K. Ohyama K.. Matsuba« H Rant 1^,^^ msf £S! 
1 4) Burke D.H.. Alberti M.. Hearst J.E. J. BacterioL 175:2407-2413(1993). 13 551 -561 (1989). 

[0627] 206. Fem'tin iron-binding regions signatures 
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ions can gain access to the central cavity of the molecule; this pattern also includes conserved acidic residues which 
are potential metal-binding sites. 

[0628] Consensus pattern: E-x-[KRl-E-x(2)-E-[KRHLFHUVMA)-x(2)-ON-x-R-x-G-R [The 3 E's are potential Iron 
ligands]- 

Consensus pattern: D-x(2HLI VMF]-[STAC]-IDH]-F4LIHEN]-x(2HFY]-L-x(6)-IU VM]-[KN] [The second D and the E are 
potential iron ligands]- 

[ 1) Crichton R.R., Charloteaux-Wauters M. Eur. J. Biochem. 164:485-506(1987). 
[ 2J Theil E.C. Annu. Rev. Biochem. 56:289-315(1987). 

[ 3) Ragland Briat J.-F.. Gagnon J.. Laulhere J.-R. Massenet O., Theil E.C. J. Biol. Chem. 265*18339-18344 
(1990). 

[0629] 207. Intermediate filaments signature (filament) 

Intermediate filaments (IF) [1»2.3] are proteins which are primordial components of the cytoskeleton and the nuclear 
envelope. They generally form filamentous structures 8 to 14 nm wide. IFprotelnsaremembersof a very large multigene 
family of proteins which has been subdivided in five major subgroups: - Type I: Acidic cytokeratlns. - Type 11: Basic 
cytokeratins. - Type III: Vimentin, desmin. glial fibrillary acidic protein (GFAP), peripherin, and plastk:in. - Type IV: 
Neurofilaments L, H and M, alpha-intemexin and nestin. - Type V: Nuclear lamina A, B1 . B2 and C. All IF proteins are 
structurally similar in that they consist of: a central rod dorroin comprising some 300 to 350 residues which is arranged 
in coiled-coiled alpha-helfces, with at least two short characteristk: intemiptons; a N-terminal non-helcal domain (head) 
of variable length; and a C-terminal domain (tail) which is also non-helical, and which shows extreme length variatton 
between different IF proteins. While IF proteins are evolutionary and stmcturally related, they have limited sequence 
homologies except in several regions of the rod domain. A conserved region at the C-terminal extremity of the rod 
domain was used as a sequence pattem for this class of proteins. 
[0630] Consensus pattem: [IV]-x-nACIl-Y-(RKH]-x-(LMl-L-[DE]- 

[ 1] Quinlan R., Hutchison C, Lane B. Protein Prof. 2:801-952(1995). 
[ 2] Steiner P.M., Roop D.R. Annu. Rev. Biochem. 57:593-625(1988). 
[ 3) Stewart M. Curr. Opin. Cell Biol. 2:91-100(1990). 

[0631] 208. Flavodoxin signature 

Flavodoxins (1 ,EJJ are electron-transfer proteins that functfon in various electron transport systems. Flavodoxins bind 
one FMN molecule, whwh sen/es as a redox-active prosthetk: group. Flavodoxins are f unctbnally interchangeable with 
ferredoxins. They have been isolated from prokaryotes. cyanobacteria, and some eukaryotic algae. The signature 
pattem for these proteins Is derived from a conserved regk>n in their N-terminal section, this region is involved in the 
binding of the FMN phosphate group. 

[0632] Consensus pattem: [LIV]-[LIVFY]-[FY]-x-[ST]-x(2)-[AGCl-x-T-x(3)-A-x(2)-(LIVl- 
[ 1 J Wakabayashi S.. Kimura K.. Matsubara H., Rogers LJ. Bkx:hem. J. 263:981-984(1989). 
[0633] 209. Growth factor and cytokines receptors family signatures (fn3) 

A number of receptors for lymphokines, hematopoeitic growth factors and growth hormone-related molecules have 
been found (1 to 5] to share a common binding domain. Receptors known to belong to this family are: - Cytokine 
receptor common beta chain. This chain is comnrjon to the IL-3. IL-5 and GM-CSF receptors. - Cytokine receptor 
common gamma chain. This chain is common to the IL-2, IL-4, IL-7 and I L-1 3 receptors. - Ciliary neurotrophic factor 
receptor (CNTFR). - Erythropoietin receptor (EPOR). - Granukx:yte cotony-stimulating factor receptor (G-CSFR). - 
Granukxyte-macrophage colony-stimulating factor receptor alpha chain (GM- CSFR). - lnterleukin-2 receptor beta 
chain (IL2R-beta). - lnterleukin-3 receptor alpha chain (IL3R). - lnterleukin-4 receptor alpha chain (IL4R). - Interleukin- 
5 receptor alpha chain (IL5R). - lnterleukin-6 receptor (IL6R). - lnterieukin-7 receptor alpha chain (IL7R). - Interleukin- 
9 receptor (IL9R). - Growth hormone receptor (GRHR). - Prolactin receptor (PRLR). - Thrombopoeitin receptor (TPOR). 
The conserved regbn constitutes ail or part of the extracellular ligand-binding region and is about 200 amino acid 
residues long. In the N-terminal of this domain there are two pairs of cysteines known, In the growth hormone receptor, 
to be involved in disulfide bonds, h XXXXXXX + I C C C C Extracel- 
lular XXXXXXX Cytoplasms I+-I-I l-l XXXXXXX +M|| Transmembrane +- 

+ +~+ Two patterns to detect this family of receptors were used. The first one is derived from the first N-terminal disulfide 
loop, the second is a tryptophan-rich pattern located at the C-terminal extremity of the extracellular regton. 
[0634] Consensus pattem: C-[LVFYRl-x(7,8)-[STIVDN}-C-x-W (The two C's are linked by a disulfide bond]- 
Consensus pattern: (STGL]-x-W-[SG]-x-W-S- 

1 1] Bazan J.R Biochem. Bbphys. Res. Commun. 164:788-795(1989). 
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[ 2J Bazan J.F. Proc. Nail. Acad. ScL U.S.A. 87:6934^938(1990) 

^- "S^SoS '""'"^ ""^^ ^"^^ ""^^ '^^^ 

[4] cf Andrea A.D., Fasman G.D., Lodish H.F. Cell 58:1023-10240 989V 

1 5J d-Andrea A.D., Fasman G.D., Lodish H.F. Curr. Opia CeB BW. 2:648-651(1990). 

[0635] 210. Phosphoribosylglycinamide fofmyttransferase active site (lofmyljransn 

Phosphoribosylglycinamide formyttransterase (EC 2i2J) (GART) (1] catalyzes the third step in de novo purine bio- 
syn hes« the transfer of a fonmyl group to 5'-phosphonl»sylglycinamide. h higher euJcaryS^^irSt 
multrfunctK^al enzyme polypeptide that catalyzes three of the steps of purine tesyntheJsirb^ 
yeast. GARTisaj«XK,,unctionalpro.einofabom200amin<>«:ld 

acid resKJue has been shown to be involved in the cata^rtic mechanism. The region around tS acZTtte restS^ 
well consented in GART from prokaryotic and eukaryotic sources and can be us^J as a signarure Sttem Si^^r«n 
ton^ltetrahydrofolate dehydrogenase (ECIJJJ) t2J is a cytosolicenzyme respo^rmf^^^"^ 
cartK,xylaWe reductcn of 10-fomvltet,ahydrofolate into fetrahydrololate. It isa^ein of abo« 900^a^^kLc^ 

SliJlT ^T*"'; *^ <200 residues) is stn«turally elated to G^Sfs^n^rS^' 
m^nyMRNA formynransferase (EC 2JJJ) (gene fmt) {3r« the enzyme ^ 

I re^.'""^ "^^^^ 

S ID isT:?cL ^ZJeP''''''''^-''-''^'''^^^ X(2)4UVn-x(6,. 

[ 1J Inglese J., Smith J.M., Benkovic S.J. Biochemistry 29:6678-6687(1990) 
1 2J Cook a J.. Uoyd R.S., Wagner C. J. Bid. Chem. 266:4965-4973(1991) 

( 3J Guitbn J..M.. Mechulam Y., Schmitter J.-M., Blanquet S.. Fayat a J. Bacteriol. 174:4294^1(1992). 
[0637] 2 1 1 . G 1 0 prote in s ignat ures 

A Xenopus protein known as GIG (1 J has been found to be highly consented in a wide range of eukarvot« soecies 
The function of GIG is stHI unknown. GIG is a protein of about 17 to in kh iiA?»t« i ,1 v ^"•^/y*"^ species. 

P63q Consensus pattern: L-C-C-x-(KR)-C-x(4)-(DE]-x-N-x(4)-C-x-C-R-V-P- 
Consensus pattern: C-x-H-C-G-C-{KRH]-G-C-(SAh 

S LILTpl^^in a^ha^Snr ' ' - ""^''^ ^""^ ^^^^^^^^l 

G P^o»e*"s couple receptors of extracellular signals to intiacellular signaling pathways The G orotein «i„h» 
subunrt binds guanyl nucleotide and is a weak GTPase. Number of members: 195 

IS! H^^r^^- ^"'^H."'" "-^^ ^' "-'"^^^ '^^^ SP-^^S SR. Science 1994:265:1405-1412 

[2] How G proteins work: a continuing story. Coleman DE. Sprang SR Trends Bk«hem ^ 1996;2i:4l54. 

10642] 21 3. Glucose-6-phosphate dehydrogenase active site (66PD) 

'^T^*"^^ (EC 1.1.1.49) (G6PD) [1] catalyzes the first step in the pentose pathwav the 

f^l p°"^^"*"^ P^"«'": D-H-Y-L-G-K-IEQiq [K is the active site reskfuej- 

GATA-4 [4). a transcnptional activator expressed in endodermally derived tissues and heart nr~^i!^if . 
f^nier (or DGATAa) (gene pnr) which acts as a repressor of the achae.e-S^e cJJ^ptex (a^^ moSSJS 
(5). whK^ regulates .heexpressionofchorongenes.-Caenorhabc«tiseleganse.t-1anS^^^^ 
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section of these enzymes has been selected as a signature panern for this family of TKA. 

Consensus pattern: [GA]-x(1,2)-[DEJ-x-Y-x-(STAP]-x-C-[NKR]-x-[CH].[UVMFYWH] ( 1] Boyle D.B.. Coupar B.E.H.. 
Gibbs A. J.. Seigman LJ.. Both G.W. Virology 156:355-365(1 987). [ 2] Blasco R.. Lopez-Otin C, Munoz M.. Bockamp 
E.-O.. Simon-Mateo C. Vinuela E. Virology 178:30 1-304(1 990). { 3] Robertson G.R.. Whalley J.M Nucleic Acids Res 
5 16:11303-11317(1988). 

[1523] 641. Thymidine kinase from herpesvirus (TK herpes) 
M) 

Medline: 96003730 

Crystal structures of the thymidine kinase from herpes simplex virus type-1 in complex with deoxythymidine and gan- 
10 ciclovir. 

Brown DG. Visse R. Sandhu G, Davies A, Rizkallah PJ, Melitz 
0, Summers WC, Sanderson MR; 
Nat Struct Biol 1995;2:676-8ei. 
Number of members: 65 

IS [1524] 642. Nuclear transition protein 2 signatures (TP2) 

In mammals, the second stage of spermatogenesis is characterized by the conversion of nucleosomal chromatin to 
the compact, non-nucleosomal and transcriptionally inactive form found in the sperm nucleus. This condensation is 
associated with a double-protein transition. The first transition corresponds to the replacement of histones by several 
spermatid-specific proteins, also called transition proteins, which are themselves replaced by protamines during the 

20 second transition. Nuclear transition protein 2 (TP2) is one of those spemnatid-specific proteins. TP2 Is a basic, zinc- 
binding protein [1] of 116 to 137 amino-acid residues. Structurally. TP2 consists of three distinct parts: a consented 
serine-rich N-terminal domain of about 25 residues, a variable central domain of 20 to 50 residues which contains 
cysteine residues, and a conserved C-terminal domain of about 70 residues rich in lysines and arginines. Two signature 
patterns for TP2 have been developed: one located in the N4erminal domain, the other in the C-termlnal. 

ss Consensus pattern: H-x(3)-H-S-[NS] S-x-P-Q-S 

Consensus pattern: K-x-R-K-x(2)-E-G-K-x(2)-K-[KR)-K 

[1] Baskaran R.. Rao M.R.S. Biochem. Biophys. Res. Commun. 179:1491-1499(1991). 
[1525] 643. Thiamine pyrophosphate enzymes signature (TTP enzymes) 

A number of enzymes require thiamine pyrophosphate (TPP) (vitamin B1) as a cof actor. It has been shown [1] that 
30 some of these enzymes are structurally related. These related TPP enzymes are: - Pyruvate oxidase (POX) (EC 1.2.3.3 ) 
Reaction catalyzed: pyruvate + orthophosphate + 0(2) + H(2)0 = acetyl phosphate + CO(2) + H(2)0(2). - Pyruvate 
decarboxylase (PDC) (EC 4.1.1.1 ) Reaction catalyzed: pynjvate = acetaWehyde + CO(2). - Indolepyruvate decarbox- 
ylase (EC 4.1.1.74) [2] Reaction catalyzed: indole-3-pyruvate = indole-3-acetaldehyde + CO(2). - Acetolactate synthase 
(ALS) (EC 4.1.3.18) Reactk)n catalyzed: 2 pyruvate = acetolactate + CO(2). - Benzoylformate decarboxylase (BFD) 
35 (EC 4.1.1.7) [3] Reaction catalyzed: benzoylformate = benzaldehyde + GO(2). A consented region which is kx:ated in 
their C-terminat section has been selected as a signature pattern for these enzymes. 
Consensus pattern: [UVMF]-{GSA]-x(5)-P-x(4)-[LIVMFYW].x-(LIVMF]-x-G-D-[GSA]-[GSAC] 

[ 1] Green J.B.A. FEBS Lett. 246:1 -5(1 98 9). [ 2] Koga J.. AdachI T. HIdaka H. Mol. Gen. Genet. 226:10-16(1991).[ 3] 
Tsou A.Y., Ransom S.C., Gertt J.A.. Buechter D.D.. Babbitt PC. Kenyon G.L Bk)chemistry 29:9856-9862(1990) 
40 [1526] 644. TPR Domain 

Medline: 95397415 

Tetratrtco peptide repeat interactions: to TPR or not to TPR? 
Lamb JR, Tugendreich S, Hieter P; 
^5 Trends Biochem Sci 1 995;20: 257-259. 

[2]Medline: 98151343 

The structure of the tetratrioopeptide repeats of protein phosphatase 5: implications for TPR-fnediated protein-protein 
interactions. 

Das AK. Cohen PW. Barford D; 
so EMBO J 1998;17:1192-1199. 

Number of members: 621 

[1527] 645. Uroporphyrin-lll C-methyltransferase signatures (TP methylase) 

Uroporphyrin-in C-methyltransferase (EC 2.1. 1.107) (SUMT) [1 .2] catalyzes the transfer of two methyl groups from S- 
adenosyl-L-methionine to the C-2 and C-7atoms of uroporphyrinogen III to yield precorrin-2 via the intemiediate for- 
ss mation of precorrin-1 . 

SUMT is the first enzyme specific to the cobalamin pathway and precorrln-2 is a common intermediate in the biosyn- 
thesis of corrinoids such as vitamin B12, siroheme and coenzyme F430.The sequences of SUMT from a variety of 
eubacterial and archaebacterial species are currently available. In species such as Bacillus megaterium (gene cobA). 
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19M).( 2J Makndes S.. Chrtpatima ST.. Bandyopadhyay R.. Brawerman G. Nucleic Acids Res 16-2350-2350(1988) 
13] Pay A Heberie-Bors E., Hirt H. Plant Mol. Btol. 19.501-503(1992).| 4) S.uerzenbaum S r' kSp fj 
. Bochim. B«phys. Acta 1398:294-304(1998).( 5] Rasmussen S.W. Yeast 10:S63-S68(1994) ^ 
[1519] 637. TFIIS zinc ribbon domain signature 

Transcription factor S-ll (TFIIS) (IJ is a eukaryotic protein necessary for efficient RNA polymerase II transcription elon- 
gajon. past template^n^ed pause sites. TFIIS sfK>ws DNA-binding actMty onV in the prese "e of RN^S^,^^ 
II. It « a pr«e.n of ab<^ 300 amm acids wtK>se sequence is highly conserved in mamrr^b. Drosophila ySZ^re 
rt was f-rst known as PPR2, a transcriptional regulator of URA4. and then as DST1. the DNA^S pSe^ 
alpha 2 ) and »i the archaebacteria Sulfolobus acidocaWarlus |3,.This fantlly also includes the e^oTand ale 

- \fecc»,« virus RNA polymerase 30 Kd subunit (rpoSO) (41. - African swine (ever virus protein I243L Tme t^t 
^^^^J P"^*"" '^'^'"^ ^^'^^"^^ ^-^^ "'"'^ a Zinc ion and fold £ a confoonatS 

dXe"%s tr^r ^^^^ ^ °' °- — canr r r ^ 

S^^^f^e^oTcs'^i'iS^^^ 

(1) Hirashima S.. Hiral H.. Nakanishi Y.. Natori S. J. Biol. Chem. 263:3858-3863(1 988) f 263 3858-3863(1 aafli r pi 
?h"^ v*^ r '^^ l ^"'^ 353:509-509(1991).[ 3) l^ger D.. Zillig W. Nucleic Acids Res. 21:2251-2251(1993) I 
Ahn B.-Y.. Gershon RD.. Jones E.V. Moss 8. Mol. Cell. Biol. 10:5433-5441 (1990).! 5] Rodriguez J M Sates I^ L 

11520] 638. Tetrahydrofolate dehydrogenase/cyclohydrolase Signatures (THF DHG CYH) 
Enemies that participate in the transfer of one<arbon units are involved in vartous biosynthetic pathways In many of 
ftese processes the transfers of one^arixx, units are mediated by the coenzyme tetrahydrofolate (?HF) Ti^us 
reactions generate one<arbon derivatives of THF which can be intercon>?erted between different oxidation st5resbv 
(ormyltetrahydrofolate symhetase(ECMl3). methylenetetrahydrofolatedehydrogenase lEcTs T^TcI^TS^ 
and metheny«e,.hydrofolate cyclohydrotese (EC a5A9).The 'dehydrogenase a^ SySrJfiSS^K^t^ 
pressed by a >«r«,y of multifunctional enzymes: - Eukaiyotic C-1 -tetrahydrofolate synSase (CI -THF ^maH) 
»^ '^^'^ ^"^""^ ^ C1-THF synthases are known [1] oneTSS 

Z i^fjT- "^1 '^^'^ ^^'^^^ dehydrogenasLic^;drS?s:i:S 

ri ™f a ^ ^"^'^ al^t 300 amL) acid residueTSi 

S TH^" ^"P""^""^ - mitochondrial bifunctional dehydrogenase/S^SSe K] 

Th.s^ an homodunenc NAD^ependen, enzyme of about 300 amino ackJ reskJues - Serial f2S^ fS^ Si 
homodimenc brtunctK)nal NADP-dependent enzyme of about 290 amino acW resWues. The sequerSif the dehydr^ 
genase/cyclohydrolase domain is highly consented in all forms of the enzyme. Two consented ^i^s have be^ 
selected ^ signature patterns. The finst one is kxated in the N-,em,inal pTof these enzymes and^talns ^ 
a«d«^esidues. The second pattern is a highly consented s^^^^^^ 

Consensus pattern: [EQl-x-tEQKl-{UVMJ(2)-x(2)-IUVIVIJ-x(2)-{UVMY]-N-x-{DNJ-x(5HUVI^(3^^^ 
Consensus pattern: P-G-G-V-G-P-[MF]-T-{IV) ' \ n «-A»rwi- riuvj 

Lim S^J^37'i^?c89^^^^^ 263:7717-7725(1988).! 2J Belanger C. Mackenzie RE. J. Bk>l. 

r«^« 2^^4837-4843(1989).[ 3] d-An L. Rabnowitz J.C. J. Bfol. Chem. 266:23953-23958(1991) 
[1621] 639. Triosephosphate isomerase active site (TIM) 

Triosephosphate isomerase (EC 5^) (TIM) [1] is the glycolytic enzyme that catalyzes the reversible interolxwersfon 
^glyceraWehyde 3-phosphate and dihydroxyacefone phosphate. TIM plays an i^K,rtant roinr^leraTTS 

of about 250 amino-acid residues. A glutamic acki resklue is involved in the catalytc mechanism 121 The seauence 
Consensus pattern: [AV)-Y-E-P-{UVM)-W-{SA1-I-G-T-[GKJ (E is the active site residue) 

11] Lohs E.. Alber T. Davenport R.C.. Rose D.. Hartman F.C.. Petsko G.A. Bkxdiemistrv 29-6609-66lfl«OQO\ rpi 
Knowles J R. Nature 350:121-124(1991). «uwiemisiiy ^»«}9-6618(1990).( 2] 

[1 522] 640. Thymidine kinase cellular-type signature (TK) 

Thymidine kinase (TK) is an ubquitous enzyme that catalyzes the ATP^fependent phosphorylatkx. of 

ftymidine. A com^rison of TK sequences has shown [1 .2.3] mat there are two different familiio^^K^e ZS 

consists of TK from the foltowmg sources: - Vertebrates. - Bacterial. - Bacteriophage T4 - Pox vimses Airinan 
fever virus (ASF). - Fish lymphocystb d«ease virus (FLDV).V consented r^ion'^^Jh is^^^ m^^eZL' 
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domains 8 and 9. Two patterns have been developed to detect this family of proteins. The first pattern is based on the 
G-R-[KR) motif; but because this molrf is too short to be specific to this family of proteins, a pattern from a larger region 
centered on the second copy of this motif was derived. The second pattern is based on a number of conserved residues 
which are located at the end of the fourth transmembrane segment and in the short loop region between the fourth 
and fifth segments. 

Consensus pattern: [LIVMSTAGHUVMFSAG]-x(2)-[LIVMSA]-[DE)-x-[LIVMFYWA]-G- R-IRK]-x{4,6)-[GSTAl 
Consensus pattern: [LIVMF)-x-G-[LIVf^FA]-x(2)-G'X(8)-(LIFY]-x(2)-[EQ]-x(6)- [RK) 

[ 1) Silverman M. Annu. Rev Biochem. 60:757-794(1 991 ).[ 2] Gould G.W.. Bell GJ. Trends Biochem. Sci. 15:18-23 
(1990).[ 3) Baldwin S.A. Biochim. Biophys. Acta 11 54: 17-49(1 993).[ 4J Maiden M.C.J., Davis E.O., BaldwinS.A., Moore 
D.C.M., Henderson RJ.F Nature 325:641 -643(1 987).[ 5] Henderson RJ.F. Curr. Opin, Struct. Biol. 1:590-601(1991). 
[ 6] Culham D.E.. Lasby B.. Marangoni A.G., Milner J.L. Steer B.A.. van Nues R.W., Wood J M J Mol Biol 229* 
268-276(1993). 

[1515] 633. Synaptobrevin signature 

Synaptobrevin (1J is an intrinsic membrane protein of small synaptic vesicles whose function is not yet known, but 
which is highly conserved in mammals, electric ray (where its is known as VAf^P-l), Drosophila and yeast (2J. In yeast 
there are two closely related forms of synaptobrevin (genes SNC1 andSNC2) while in mammals there is at least 4 
(genes SYB1 , SYB2, SYB3 and SYBL1 ).StructuralIy synaptobrevin consist of a N-terminal cytoplasmic domain of from 
90 to 110 residues, followed by a transmenrrt)rane region, and then by a short (from 2 to 22 residues) C-temiinal htra- 
vesicular domain. As a signature pattern for synaptobrevin, a highly consented stretch of residues located in the central 
part of the sequence was selected. 

Consensus pattern: N-[LIVMHDENS]HKL]-V-x-[DEQ]-R-x(2)-IKRHLIVK41-(STDEhx-[LIV^^ 

[ 1] Suedhof T.C., Baumert M., Perin M.S., Jahn R. Neuron 2:1475-1 481 (1989).( 2] Gerst J.E.. Rodgers L., Riggs M., 
Wigler M. Proc. Natl. Acad. Sci. U.S.A. 89:4338-4342(1992). 

[1516] 634. TBC domain. Identification of a TBC domain in GYPS^YEAST and GYP7_YEAST, which are GTPase 
activator proteins of yeast Ypt6 and Ypt7. imply that these domains are GTPase activator proteins of Rab-like small 
GTPases. Number of members: 55 

[1 J Medline: 96032578. Molecular cloning of a cDNA with a novel domain present in the tre-2 oncogene and the 
yeast cell cycle regulators BUB2 and cdc16. Rfchardson PM, Zon LI; Oncogene 1995;11:1139-1148. 
[2]Medline: 97398935. A shared domain between a spindle assembly checkpoint protein and Ypt/Rab-speclfic 
GTPase-activators. Neuwald AF; Trends Biochem Sci 1997;22:243-244. 

[1517] 635. Transcriptk>n factor TFIID repeat signature (TBP) 

Transcription factor TFIID (or TATA-binding protein. TBP) [1 ,2J is a general factor that plays a major role in the activation 
of eukaiyotic genes transcribed by RNA polymerase II. TFIID binds specifically to the TATA box promoter element 
which lies ctose to the position of transcription initiation. There is a remarkable degree of sequence conservation of a 
C-terminal domain of about 180 residues in TFIID from various eukaryotic sources. This region isnecessary and suf- 
ficient for TATA fcKDx binding. The most significant structural feature of this domain is the presence of two consented 
repeats of a 77 amino-acid region. The intramolecular symmetry generates a saddle-shaped structure that sits astride 
the DNA [3]. Drosophila TRF (TBP-related factor) [4J is a sequence-specific transcriptbn factor that also binds to the 
TATA box and ts highly similar to TFIID. Archaebacteria also possess a TBP homotog [SJ. A signature pattern that 
spans the last 50 residues of the repeated region has been derived.- 

Consensus pattern: Y-x-P-x(2)-{IFl-x(2)-ILIVMl(2)-x-(KRH]-x(3)-P-{RKQ]-x(3)- L-[LIVM]-F-x-[STN].G-[KRHUVM)-x 
(3)-G-[TAGLHKR]-x(7)- [AGC]-x(7)-{LIVM [ 1] Hoffmann A., Sinn E.. Yamamoto T.. Wang J., Roy A.. Horikoshi M., 
Roeder R.G. Nature 346:387-390(1990).! 2] Gash A., Hoffnnann A., Horikoshi M.. Roeder R.G., Chua N.-H Nature 
346:390-394(1990).! 3] Nikotov D.B., Hu S.-H.. Lin J.. Gasch A.. Hoffmann A.. Horikoshi M., Chua N.-H.. Roeder R 
G,. Burley S.K. Nature 360:40-46(1992).! 4] Crowley TE.. Hoey T. Liu J.-K., Jan Y.N.. Jan L.Y.. Tjian R. Nature 361: 
557-561 (1993).! 5] Marsh TL, Reich C.I.. Whitelock R.B., Olsen G.J. Proc. Natl. Acad. Sci. USA. 91-4180-4184 
(1994). 

[1518] 636. Translationally controlled tumor protein signatures (TCTP) 

Manrtmalian translatk)nally controlled tumor protein (TCTP) (or P23) is a protein which has been found to be preferen- 
tially synthesized in cells during the eariy growth phase of some types of tumor !1 ,2], but which is also expressed in 
normal cells. The physiological function of TCTP is still not known. It is a hydrophilk: protein of 18 to 20 Kd. Ctose 
homotogs have been found in plants 13], earthworm !4J, Caenorhabditis elegans (F52H2,11). Hydra, budding yeast 
(YKL056C) [5] and fission yeast (SpACIFl 2.02c) Two of the best consented regkxis have been selected as signature 
patterns for TCTP. 

Consensus pattern: nFAHGAHGASl-N-!PAK]-S-!GAI-E-!GDEHPAGEHDEQGA} 

Consensus pattern: IFLVHl-!FY]-!IVCT]-G-E-x-!MAl-x(2,5)-IDENl-[GAST]-x-ILV]-!AV]-x(3)-IFYWl 
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Copper/Zinc superoxide dismutase (SODC) [1 J is one erf the three forms of an enzyme that catalyzes the dismutation 
of superoxide radicals. SODC binds one atom each of zinc and copper. Various forms of SODC are known: acytoplasmic 
form in eukaiyotes. an additional chloroplast fomri in plants, an extracellular form in some eukaryoles, and a periplasmic 
form in prokaryotes. The metal binding sites are conserved in all the known SODC sequences f2J. Two signature 
patterns have been derived for this family of enzymes: the first one contains two histidine residues that bind the copper 
atom; the second one iskjcated in the C-terminal section of SODC and contains a cysteine which is involved in a 

disulfide bond. Consensus pattern: lGA]-[IMFATl.H^LIVF]-H-x(2)-[GPHSDG]-x-{STAGDE] [The two H's are coooer 
llgands] 

Consensus pattern: G-[GNJ-[SGAJ-G-x-R-x-(SGAJ-C-x(2HlV] [C is involved in a disulfide bond] 
( 1] Bannister J.V.. Bannister W.H., Rotilio G. CRC Crit. Rev. Biochem. 22:111-154(1987) [21 Smith M W Doolittte R 
F. J. Mol. EvoL 34:175-184(1992). " " ^"""^ 

[1510] 629. (sodfe) Manganese and iron superoxide dismutases signature 

Manganese superoxide dismutase (SODM) [1] is one of the three forms of an enzyme that catalyzes the dismutatkxi 
of superoxide radicals. The four ligands of the manganese atom are consen/ed in all the known SODM sequences 
These metal ligands are also conserved in the related iron form ofsuperoxide dismutases [2.3J. A short consented 
region which includes two of the four ligands: an aspartate and a histkJine has been selected as a signature 
Consensus pattern: D.x-W-E-H-[STAHFYJ(2) [D and H are manganese/iron iigands] 

[ 1 J Bannister J.V.. Bannister W.H.. Rotilte G. CRC Crit. Rev. Biochem. 22: 11 M 54(1 987). [ 2] Parker M W Blake C C 

F. FEBS Lett 229:377-382(1988).( 3] Smith M.W,. Doolittle R.F. J. Md. Evol. 34:175-184(1992) 
[1511] 630. Spectrin repeat 

[1512] Spectrin repeats are found in several proteins involved m cytoskeletal structure. These include spectrin a^ha- 
actintn and dystrophiaThe sequence repeat used \n this family is taken from the structural repeat in reference [2] The 
spectrin repeat forms a three helix bundle. The second helix is interrupted by proline in some sequences 
Number of memt>ers: 898 

(1 1 Actin-binding proteins. 1 : Spectrin super family. Hartwig JH; Protein Profile 1 995;2;732-732 [2] Crystal struc- 
ture of the repetitive segments of spectrin. Van Y, Winograd E, Viel A, Cronin T. Harrison SC, Branton D- Science 1 993- 

262:2027-2030. ' 

[1 51 3J 631 . (subtilase) Streptomyces subtilisin-type inhibitors signature 

Bacteria of the Streptomyces family produce a family of proteinase inhibitots[1J characterized by their strong activity 
toward subtilisia They arecollectively loiovm as SSI's: Streptomyces Subtilisin Inhibftors. Some SSI'salso inhibit trypsin 
or chymotrypsin. In their mature secreted forni. SSI's areproteins of about 110 residues with two conserved disulfide 
bonds. + 1- H + nil 

xxxxxxxxxxxx>o(Cxxxx)oo(Cxxxx)otxxxCx#xx)oooooooc)0(Cxxxxxx"*^ consen/ed cysteine involved in a di- 

sulfide bond.'#': active site residue.'*': position of the pattern. 

Consensus pattern: C-x-P-x<2.3)-G-x-H-P-x(4)-A-C4ATDhx-L [The two C's are involved in a disulfide bond] 
1 1] Taguchi S., Kojima S., Terabe M.. Miura K.-I., Momose H. Eur. J. Biochem. 220:911-918(1994). 
(1514J 632. Sugar transport proteins signatures 

In mammalian cells the uptake of glucose is mediated by a family of closely related transport proteins which ate called 
the glucose transporters (1,2,3). At least seven of these transporters are currently known to exist (in Human they are 
encoded by the GLUT1 to 6LUT7 genes).These integral membrane proteins are predicted to comprise twelve mem- 
brane spanning domains. The glucose transporters show sequence similarilies 14.5] with a number of other sugar or 
metabolite transport proteins listed below (references are only provided for recently determined sequences) - Es- 
cherichia coll arabnose-proton symport (araE). - Escherichia coli galactose-proton symport (galP) - Escherichia coli 
and Klebsiella pneumoniae citrate-proton symport (also known as citrate utilization detenninant} (gene cil) - Es- 
cherichia coll alpha-ketoglutarate peimease (gene kgtP). - Escherichia coB prolineA>etaine transporter (gene pioP) [6] 
- Eschenchia cofi xylose-proton symport (xylE). - Zymomonas mobOis glucose facilitated diffusion protein (gene gif) - 
Yeast high and low affinity glucose transport proteins (genes SNF3. HXTI to HXT14). - Yeast galactose transporter 
(gene GAL2). - Yeast maltose pemieases (genes MAL3T and MAL6T). - Yeast myonnositol transporters (genes ITR1 
and ITR2). - Yeast carboxylic acid transporter protein homolog JEN1 . - Yeast inorganic phosphate transporter (gene 
PH084). - Kluyveromyces lactis lactose pemiease (gene LAC12). - Neurospora crassa quinate transporter (gene Qa- 
y). and Emoricelia nidulans quinate permease (gene qutO). - Chtorella hexose earner (gene HUP1) - Arabidopsis 
thaliana glucose transporter (gene STP1). - Spinach sucrose transporter. - Leishmania donovani transporters Dl and 
D2. - Leishmania ennettii probable transport protein (LTP). - Yeast hypothetical proteins YBR24lc YCR98c and 
YFL040W. - Caenorhabditis elegans hypothetfcal protein ZK637. 1 . - Escherfchia coli hypothetical proteins yabE ydiE 
and yhjE. - Haemophilus influenzae hypothetical proteins HI0281 and HI0418. - Bacillus subtilis hypothetical pr<Meins 
yxbC and yxdF. It has been suggested [4] that these transport proteins have evolved from thedupTication of an ancestral 
protein with sk transmembrane regions, this hypothesis is based on the consenration of tvro G-R-{KFn motifs The lifst 
one IS kx:ated between the second and third transmembrane domains and the second one between transmembrane 
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Consensus pattern: [GS]-x-|LIVMF)-x(2)-A-[DNEQASHHGNEKl-G-ISTIMHLI\^FYl(3)-(DE]-[EKHLIVM] 
Consensus pattern: IFYWJ-P-[GSl-N-[LlVM]-R-(EQ]-L-x-(NHAT] 

( 1) Morrett E.. Segovia L J. Bacteriot. 175:6067-6074(1 993). ( 2] Austin S.. Kundrot C, Dixon R Nucleic Acids Res 
19: 228 1-2287(1 991). [3] Albright LM.. Huala E.. Ausubel F.M. Annu. Rev. Genet. 23' 3 1 1-336(1 989) f 41 Austin S 
Dixon R. EMBO J. 11:2219-2228(1992). 
[1 506] 625. Signia-70 factors family signatures 

Sigma factors [1] are bacterial transcription initiation factors that promote the attachment of the core RNA polymerase 
to specific initiation sites and arethen released. They alter the specificity of promoter recognition. Most bacteria express 
a multiplicity of sigma factors. Two of these factors, sigma-70 (gene rpoD). generally known as the major or primary 
Sigma factor, and sigma-54 (gene rpoN or ntrA) direct the transcription of a wide variety of genes. The other sigma 
factors, known as alternative sigma factors, are required for the transcriptkjn of specific subsets of genes. With regard 
to sequence similarity, sigma factors can be grouped into two classes: the sigma-54 and sigma-70 families. The sigma- 
70 family includes, in addition to the primary sigma factor, a wide variety of signr^ factors, some of which are listed 
below: - Bacillus signna factors involved in the control of sporulation-specific genes: sigma-E (sigE or spollGB), sigma- 
F (sIgF or spollAC). sigma-G (sigG or spolllG). sigma-H (sigH or spoOC) and sigma-K (sigK or spolVCB/spolllC). - 
Escherichia coll and related bacteria sigma-32 (gene rpoH or htpR) involved In the expression of heat shock genes. - 
Escherichia coll and related bacteria sigma-27 (gene fliA) involved in the expression of the flagellin gene. - Escherichia 
coll sigma-S (gene rpoS or katF) which seems to be involved in the expresskxi of genes required for protection against 
external stresses. • Myxococcus xanthus sigma-B (sigB) which is essential for the late-stage differentiation of that 
bacteria. Alignments of the sigma-70 family permit the identification of four regions of high conservation {2.3]. Each of 
these four regions can In turn be subdivided Into a number of sub-regions. Signature patterns based on the two best- 
conserved sub-regions have been developed. The first pattern corresponds to sub-regk)n 2.2;the exact function of this 
sub-region is not known although it couW be involved in the binding of the sigma factor to the core RNA polymerase. 
The second pattern corresponds to sub-regfon 4.2 which seems to harbor a DNA-bindrng 'helix-turn-helix' motif Involved 
in binding the consented -35reglon of promoters recognized by the major sigma factors. The second pattem starts one 
residue before the N-terminal extremity of the HTH region and ends six residues after its C-terminal extremity 
Consensus pattern: tDE]-lLIVMF](2)-[HEQS]-x-G-x-(LIVMFA]-G.L-{LIVf^FYE]-x-{GSAM]-[LIVMAPl 

Consensus pattem: (STN]-x(2)-IDEQl-[UVMHGAS)-x(4)-[LIVMFl-[PSTG).x(3)-(LTVMA]-x-rNQRHU\^HEQ 
(3)-{LIVMFWl-x(2)-ILIVM] ^ 

1 1] Helmann J.D., Chamberlin M.J. Annu. Rev. Biochem. 57:839-872(1 988).1 2) Gribskov M., Burgess R.R. Nucleic 
Acids Res. 14:6745-6763(1 986).[ 3] Lonetto M.A., Gribskov M., Gross C.A. J. Bacteriol. 1 74:3843-3849(1 992).[4J 
Lonetto M.A., Brown K.L.. Rudd K.E., Buttner M.J. Proc. Natl. Acad. Sci. U.S.A. 91:7673-7577(1994). 
[1507] 626. Signal carboxyl-terminal domain. 430 members. 
[1508] 627. Signal peptidases I signatures 

Signal peptidases (SPases) [1 ] (also known as leader peptidases) remove the signal peptkJes from secretory proteins. 
In prokaryotes three types of Spases are known: type I (gene lepB) whfch is responsible for the processing of the 
majority of exported pre-proteins; type II (gene Isp) which only process lipoproteins, and a third type involved in the 
processing of plli subunits. SPase I Is an Integral membrane protein that is anchored in the cytoplasmic membrane by 
one (In B. subtills) or two (in E. coll) N-terminal transmembrane domains with the main part of the protein protuding in 
the periplasmic space. Two residues have been shown [2,3] to be essential for the catalytic activity of SPase I: a serine 
and an lysine.SPase I is evolutbnary related to the yeast mitochondrial inner membrane protease subunit 1 and 2 
(genes IMP1 and IMP2) whch catalyze the removal of signal peptides required for the targeting of proteins from the 
mitochondrial matrix, across the Inner membrane, into the inter-membrane space [4]. In eukaryotes the removal of 
signal peptides is effected by an oligomeric enzymatic complex composed of at least five subunits: the signal peptidase 
complex (SPC). The SPC is kx::ated in the endoplasmic reticulum membrane. Two components of mammalian SPC 
the 18 Kd (SPC18) and the 21 Kd (SPC21) subunits as well as the yeast SEC11 subunit have been shown [5] to share 
regions of sequence similarity with prokaryotic SPases I and yeast IMP1/IMP2. Three signature patterns for these 
proteins have been developed. The first signature contains the putative active site serine, the second signature contains 
the putative active site lysine which is not consented in the SPC subunits. and the third signature corresponds to a 
consen/ed region of unknown iological significance which is kx:ated in the C-terminal section of all these proteins 
Consensus pattern: [GS]-x-S-M-x-[PS]-[AT]-[LF] [S is an active site residue] 

Consensus pattem: K-R-(UVMSTAJ(2)-G-x-[PG]-G-[DEJ-x.[LIVM]-x4U VMFY] (K Is an active sHe residue] 
Consensus pattem: [LIVMFYW](2)-x(2)-G-D-{NH]-x(3)-[SND]-x(2)-[SG] 

[ 1) Dalbey R.E.. von Heijne G. Trends Biochem. Sci. 17:474-478(1 992). [ 2) Sung M.. Dalbey R.E. J. Biol. Chem 267 
1 3154-131 59(1 992).[ 3] Black M.T J. Bacteriot. 1 75:4957-4961 (1 993).( 4] Nunnari J.. Fox TD.. Walter P Science 262: 
1 997-2004(1 993).[ 5] van DijI J.M., de Jong A., Vehmaanpera J.. Venema a, Bron S. EMBO J. 11:2819-2828(1992). 
[ 6) Rawllngs N.D., Barrett A.J. Meth. Enzymol. 244:1 9-61 (1994)-(E1] 
[1509] 628. (sodcu) Copper/Zinc superoxkie dismutase signatures 
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family are listed below (references are only provided for recently determined sequences); - Alpha-1 protease inhibitor 
(alpha-1 -antitrypsin, contrapsin). - Alpha-1 -antichymotrypsin, - Anlithrombin HI. - Alpha-2-anlip!asmin. - Heparin co- 
factor II. - Complenrient CI inhibitor. - Plasminogen activator inhibitors 1 (PAI-1) and 2 {PAI-2). - Giia derived nexin 
(GDN) (Protease nexin I). - Protein C inhibitor. - Rat hepatocytes SPM , SPI-2 and SPI-3 inhibitors. - Human squamous 
5 cell carcinoma antigen (SCCA) which may act in the modulation of the host immune response against tumor cells. - A 
lepidopteran protease inhibitor. - Leukocyte elastase inhibitor which, in contrast to other serpins, is an intracellular 
protein. - Neuroserpin [5], a neuronal inhibitor of plasminogen activators and plasmin. - Cowpox virus crmA (6J, an 
inhibitor of the thiol protease interleukin-1 B converting enzyme (ICE). CnnA is the only serpin known to inhibit a rion- 
serine proteinase. - Some orthopoxviruses probable protease inhibitors, which may be involved in the regulatk)n of the 
10 blood clotting cascade and/or of the complement cascade in the mammalian host. On the basis of strong sequence 
similarities, a number of proteins with no known inhtoitory activity are said to betong to this family: - Birds ovalbumin 
arKl the related genes X and Y proteins. - Angiotensinogen; the precursor of the angtotensin active peptide. - Barley 
protein Z; the major endosperm albumin, - Corticosteroid binding globulin (CBG). - Thyroxine-binding globulin (TBG). 

- Sheep uterine milk protein (UTMP) and pig uterofenin-associated protein (UFAP). - Hsp47, an endoplasmic reticulum 
heat-shock protein that binds strongly to collagen and could act as a chaperone in the collagen biosynthetic pathway 
[7]. • Maspin, which seems to function as a tumor supressor |5J. - Pigment epithelium-derived factor precursor (PEDF), 
a protein with a strong neutrophic activity [8). - Ep45, an estrogen-regulated protein from Xenopus (91. A signature 
pattern has been developed for this family of proteins, centered on a well conserved Pro4>he sequence which is found 
ten to fifteen residues on the C-terminai side of the reactive bond 

[1504] Consensus pattern: [UVMFYhx4LIVMFYAC]-[DNQ]4RKHQSHPSTl-F-[LIVMFY)-{LIVMFYC)-x4UVMFAHJ- 
1 1 J Carrell R.. Travis J. Trends Biochem. Sci. 10:20-24(1 985). [ 2] Carrell R., Pemberton PA.. Bosweil D.R. Cold Spruig 
Harbor Symp. Quant. Biol. 52:527-535(1 987).[ 3] Huber R., Carrell R.W. Bkschemistry 28:8951^966(1989)14] 
Remold-O'Donneel E. FEBS Lett. 315:105-108(1993).! 5] OstenwaWerT. Contartese J.. Stoeckli E.T. Kuhn TB . Son- 
deregger P EMBO J. 15:2944-2953(1 996).[6J Komiyama T. Ray C.A.. Pickup D.J.. Howard A.D., Thomberry N.A. 
SB Peterson E.P. Salvesen G. J. Biol. Chem. 269:19331 -19337(1 994). [7] Clarke E.. Sandwal B.D. Biochim. Biophys 
Acta 11 29:246-248(1 992).( 8] Zou 2.. Anisowicz A.. Neveu M.. Rafidi K.. Sheng S.. Sager R.. Hendrix M.J.. Seftor E., 
Thor A. Science 263:526-529(1 994). [ 9] Steele RR., Chader G.J.. Johnson LV.. Tombran-Tink J. Proc, Natl. Acad* 
Sci. U.S.A. 90:1526-1530(1993).[101 Holland LJ.. Suksang C, Wall A.A.. Roberts LR.. Moser D.R.. Bhattacharya A. 
J. Bk)l. Chem. 267:7053-7059(1992). 
30 [1505] 624. Sigma-54 interaction domain signatures and profile 

Some bacterial regulatory proteins activate the expression of genes from promoters recognised by core RNA polymer- 
ase associated with the alternative sigma-54 factor. These have a conserved domain of about 230 residues involved 
in the ATP-dependent [1 ,2] interactkwi with sigma-54. This domain has been found in the proteins listed below: - acoR 
from Alcaligenes eutrophus, an activator of the acetoin catabolism operon acoXABC.-algB from Pseudomonas aeru- 
ginosa, an actuator of alginate biosynthetic gene algD. - dctD from Rhizobium an activator of dct A. the C4-dicarboxylate 
transport protein. - dhaR from Citrobacter freundii. a regulator of the dha operon for glycerol utili2atk)n. - fhIA from 
Escherichia coli. an activator of the formate dehydrogenase H and hydrogenase HI stnjctural genes. - fIbD from Cau- 
lobacter crescentus, an activator of flagellar genes. - hoxAf rom Alcaligenes eutrophus. an activator of the hydrogenase 
operon. - hrpS from Pseudonx>nas syringae, an activator of hprO as well as other hrp \oc\ involved in plant pathogen«ity 

- hupRI from Rhodobacter capsulatus. an activator of the [NiFeJ hydrogenase genes hupSL. - hydG from Escherichia 
coli and Salmonella typhimurium. an activator of the hydrogenase activity. - levR from Bacillus subtilis. whfch regulates 
the expressfon of the levanase operon (levDEFG and sacC). - nif A (as well as anf A and vnf A) from various bacteria, 
an activator of the nif nitrogen-fixing operon. - ntiC. from varbus bacteria, an activator of nitrogen asslmflatory genes 
such as that for glutamine synthetase (gin A) or of the nif operon. - pgtA from Salmonella typhimurium. the activator of 
the inducible phospho- glycerate transport system. - pllR from Pseudomonas aerughiosa, an activator of pilin gene 
transcriptbn. - rocR from Bacillus subtilis. an activator of genes for arginine utilizatton - tyri^ from Escherichia coli, 
involved in the transcriptional regulatwn of aromatic amino-acid brasynthesis and transport. - wtsA. from EnvNa stew- 
artfi. an activator of plant pathogenicity gene wtsB. - xylR from Pseudomonas putida. the activator of the tol plasmd 
xylene catabolism operon xylCAB and of xylS. - Escherichia coli hypothetical protein yfhA. - Escherichia coli hypothet- 
ical protein yhgB. About half of these proteins (algB. dcdX flbD. hoxA. hupRI , hydG, ntrC. pgtA and pilR) bebng to 
signal transductkx* two-component systems [3] and possess a domain that can be phosphoiylated by a sensor-kinase 
protein in their N- temiinal section. Almost alt of these proteins possess a herix4um-helbc DNA-binding domain in their 
C-temninal section. The domain whk:h interacts with the sigma-54 factor has an ATPase activity. This may be required 
to promote a confomnational change necessary for theinteiactton [4]. The domain contains an atypical ATP-binding 
mow A (P-kx)p) as well as a form of motif B. The two ATP-binding motifs are kx^ted in the N-temiinal sectkxi of the 
domain; signature patterns have been devetoped for both nx>tifs- Other regions of the domain are also consen/ed. One 
of them, located in the C-terminal section, has been selected as a third signature pattern. 
Consensus pattern: |LIVMFYJ(3)-x-G-lDEQl-|STEJ-G-[STAVl-G-K-x(2)-lLIVMFY] 



35 



40 



4S 



SO 



SS 



EP 1 033 405 A2 



[1498] 620. Protein secY signatures 

The eubacteria! secY protein [1] plays an important role in protein export. It interacts with the signal sequences of 
secretory proteins as well as with two other components of the protein translocation system: secA and secE. SecY is 
an integral plasma membrane protein of 419 to 492 amino acid residues that apparently contains ten transmembrane 
segments. Such a structure probablyconf ers to secY a translocator* function, providing a channel for periplasmic and 
outer-membrane precursor proteins.Homologs of secY are found in archaebacteria [2]. SecY is also encoded in the 
chloroplast genome of some algae (3] where it couid be involved in a prokaryotic-like protein export system across the 
two membranes of the chloroplast endoplasmic reticulum (CER) which is present in chromophyte andcryptophyte algae. 
Two signature patterns have been developed for secY proteins. The first corresponds to the second transmembrane 
region, which is the most conserved section of these proteins. The second spans the C-terminal part of the fourth 
transmembrane region, a short intracellular loop, and the N-terminal part of the fifth transmembrane region. 
Consensus pattern: [GST]^LIVf^F](2)-x^LIVfyfl-G^UV^^]-x-P-[LIVMFY](2)-x^ASHGSTQHLIVIi4F 
(2) 

Consensus pattern: [LIVMFYW](2)-x-(DEhx-[LIVI^F]-[STNl-x(2)-G4LIVMF]4GST]-[NST]-G-x-[GST|-(LIVMF](3) 

[ 1] Ito K. Mol. Microbiol. 6:2423-2428(1 992).t 2] Auer J.. SpickerG.. BoeckA. Biochimie 73:683-688(1 991 ).[ 3j Douglas 

S.E. FEBS Lett. 298:93-96(1992). 

[1 499] 621 . (Seed protein) Small hydrophilic plant seed proteins signature. The following small hydrophillc plant seed 
proteins are structurally related: - Arabkk)psis thaliana proteins GEA1 and GEA6. - Cotton late embryogenesis abun- 
dant (LEA) protein D-1 9. - Carrot Eh4B-1 protein. - Barley LEA proteins B1 9. 1 A, Bl 9. 1 B. B1 9. 3 and B 1 9.4. - Maize late 
embryogenesis abundant protein Emb564. - Radish late seed maturatbn protein p8B6.-Rice embryonic abundant 
protein Empl. - Sunflower 10 Kd late embryogenesis abundant protein (DSIO). • Wheat Em proteins. These proteins 
contains from 83 to 153 amino acid reskiues and may play a role[1,2] in equipping the seed for survival, maintaining 
a minimal level of hydration in the dry organism and preventing the denaturatk>n of cytoplasmic components. They 
may also play a role during imbibition by controlling water uptake. As a signature paltem. the best conserved region 
in the sequence of these proteins has been developed, it is a glycine-f k:h nonapeptide located in the N-terminal section, - 
[1500] Consensus pattern: G-[EQ]-T-V-V-P-G-G-T- 

[1501] [ 1] Dure L. Ill, Crouch M., Harada J., Ho T-H. D.. Mundy J., Quatrano R., Thomas T, Sung 2.R. Plant Mol. 
Bbl. 1 2:475-486(1 989).[ 2] Gaubier R. Raynaf M., Hull G.. Huestis G.M.. Grellet R. Arenas C, Pages M.. Delseny M. 
MoL Gen. Genet. 238:409-418(1993). 
[1502] 622. Serine carboxypeptidases. active sites 

All known carlxsxypeptidases are either metallo cart^oxypeptidases or serinecarboxypeptkiases. The catalytic activity 
of the serine cartx>xypeptkJases, like that of the trypsin family serine proteases, is provided by a charge relay system 
involving an aspartic acid residue hydrogen-bonded to a histidine, which is itself hydrogen-bonded to a serine [IJ. 
Proteins known to be serine carboxypeptkiases are: - Barley and wheat serine carix>xypeptkiases I, It, and IH [2J. - 
Yeast carboxypeptidase Y (YSCY) (gene PRC1), a vacuolar protease involved in degrading small peptides. - Yeast 
KEX1 prptease. Involved in killer toxin and alpha-factor precursor processing. - Fissfon yeast sxa2, a probable carbox- 
ypeptidase involved in degrading or processing mating pheromones [3]. - Penicillium janthinellum carboxypeptidase 
SI [4]. - Aspergullus niger cariDOxypeptidase pepF - Aspergullus satoi carboxypeptkiase cpdS. - Vertebrate protective 
protein / cathepsin A [S], a lysosomal protein which is not only a carboxypeptidase but also essential for the activcty of 
both beta-galactosklase and neuraminMase. - Mosquito vitellogenic carboxypeptkiase (VCP) [6]. - Naegteria fowleri 
virulence-related protein Nf314 [7J. - Yeast hypothetical protein YBR139w. - Caenorhabditis elegans hypothetcal pro- 
teins C08H9.1 . F13D12.6. F32A5.3, F41C3.5 and K10B2.2.This family also includes: • Sorghum (s)-hydroxymandelo- 
nltrile lyase (hydroxynitrile lyase) (HNL) [8], an enzyme involved in plant cyanogenesis. The sequences surrounding 
the active site serine and histidine reskJues are highly conserved in all these serine carix)xypeptidases. 
Consensus pattern: (LIVM]-x-[GTA)-E-S-Y-lAGl-fGS] (S is the active site residue] 

Consensus pattern: [UVFhx(2)-(UVSTA).x4IVPST|-x-lGSDNQL]-[SAGVJ-[SGJ-H-x-(IVAQ]-P-x(3)-(PSA] (H is the ac- 
tive site reskiue] 

[ 1] Liao D.I.. Remington S.J. J. Bral. Chem. 265:6528-6531 (1990). [2] Sorensen S B., Svendsen I., Breddam K. 
Carisberg Res. Commun. 54: 1 93-202(1 989).[ 3J Imai Y, Yamamoto M. MoL Cell. Biol. 12: 1827- 1834(1 992). [4] Sv- 
endsen I., Hofmann T. Endrizzi J., Remington J., Breddam K. FEBS Lett 333:39-43(1 993).[ 5) Galjart N.J., Morreau 
H., Willemsen R.. Glllemans N., Bonten E.J.. d'Azzo A. J. Bbl. Chem. 266: 14754-1 4762(1 991 ).[ 6] Cho W.L, Deitsch 
K.W., Raikhel A.S. Proc. Natl. Acad. Sci. U.S.A. 88:10821 -10824(1 991 ).[ 7] Hu W.N.. Kopachik W.. Band R.N. Infect. 
Immun. 60:2418-2424(1 992).l 8] Wajant H.. Mundry K.W.. Pfitzenmaier K. Plant Mol. Btol. 26:735-746(1 994).[ 9] Rawl- 
ings N.D.. Barrett A. J. Meth. EnzynrxjI. 244:1 9-61 (1994).(E1] 

[1503] 623. Serpins signature. Serpins (SERine Proteinase INhibitors) [1,2,3,4] are a group of structurally related 
proteins. They are high molecular weight (400 to 500 amino acids).extracellular. irreversible serine protease inhibitors 
with a well defined stmctural-f unctional characteristk;: a reactive regbn that acts as a *balt* for an appropriate serine 
protease. This region is found in the C-terminal part of these proteins. Proteins which are known to belong to the serpin 
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Sucrose synthases catalyse the synthesis of sucrose from UDP-glucose and fructose. This family includes the bulk of 
the s(K:rose synthase protein. However the cartx>xyi terminal region of the sucrose synthases belongs to the glycosyl 
transferase family Glycos tiansf 1 . 
[14d0] 614. Sulfotransferase proteins 
Number of members: 59 

[1491] 615. Synaptophysin / synaptoportn signature 

Synaptophysin and synaptoporin [ 1 J are structurally related proteins, found in the membrane of synaptic vesicles, which 
may function as ionic or solute channels. These two glycoproteins seem to span the membrane four times. Both their 
N- and C-termini sequences seem to be cytoplasmically located. As a signature pattern for this family of proteins, a 
highly consented region located in the beginning of the first intravesicular loop just after the first transmembrane domain 
has been selected. This region contains a cysteine residue that may be involved in a disulfide bond. 
Consensus pattern: L-S-V4DEJ-C-X-N-K-T (C may be involved in a disulfide bond [ 1 J Knaus P.. liftarqueze-Pouey B., 
Scherer H.. Betz H. Neuron 5:453-462(1990). 
[1492] 616. Syndecans signature 

Syndecans [1 ,2] (from the greek syndein; to bind together) are a family of transmembrane heparan sulfate proteogly- 
cans which are implicated in the binding of extracellular matrix components and growth factors. Syndecans bind a 
variety of molecules via their heparan sulfate chains and can act as receptors or as co-receptors. Structurally, these 
proteins consist of four separate domains: a) A signal sequence; b) An extracellular domain (ectodomain) of variable 
length and whose sequence is not evolutionary conserved in the vark>us fonns of syndecans. The ectodomain contains 
the sites of attachment of the heparan sulfate glycosaminoglycan side chains; c) A transmembrane region; d) A highly 
conserved cytoplasmic domain of about 30 to 35 reskJues whrch could interact with cytoskeletal proteins. The proteins 
known to belong to this family are: - Syndecan 1. - Syndecan 2 or fibroglycan. - Syndecan 3 or neuroglycan or N- 
syndecan. - Syndecan 4 or amphiglycan or ryudocan. - Drosophila syndecan. - Caenorhabditis elegans probable syn- 
decan (F57C7.3).The signature pattern that has been devebped for syndecans starts with the last residue of the 
transmembrane region and includes the first 10 residues of the cytoplasms domain. This region, which contains four 
basic residues, could act as a stop transfer site. 
Consensus pattern: (FY]-R-[IMJ-[KR]-K(2)-D-E-G-S-Y 

[ 1] Bemfiefcl M.. KokenyesI R.. Kato M., Hinkes M.T. Spring J., GalkJ R.L, Lose E.J. Annu. Rev. Cell Biol. 8:365-393 
(1992).[2J David G. FASEB J. 7:1023-1030(1993). 
[1 493} 617. Syntaxtn / epimorphin family signature 

The following proteins have been shown to be evolutbnary related [1 ,2,3]: - Epimorphin (or syntaxin 2). a mammalian 
mesenchymal protein which plays an essentia! role in epithelial morphogenesis. - Syntaxin 1 A (also known as antigen 
HPC-1) and syntaxin 1 B whfch are synaptk; proteins which may be involved in docking of synaptic veskdes at presy- 
naptic active zones. - Syntaxin 3. - Syntaxin 4. which is potentially involved in dock^g of synaptk: vesksles at presynaptk: 
active zones. - Syntaxin 5, which mediates endoplasmic reticulum to golgi transport - Syntaxin 6, which is involved in 
intracellular vesicle trafficking. - Syntaxin 7. - Yeast PEP12 (or VPS6) which is required for the transport of proteases 
to the vacuole. - Yeast SED5 which is required for the fusion of transport vesicles with the Golgi complex. - Yeast SS01 
and SS02 which are required for vesicle fusion with the plasma membrane. - Yeast VAM3. which is required for vacuolar 
assembly, - Arabidopsis thaliana protein KNOLLE which may be involved in cytokinesis. - Caenorhabditis elegans 
hypothetfcal proteins F35C8.4. F48F7.2, F55 A11 .2 and T01 B1 1 .3.The above proteins share the following character- 
istics: a size ranging f romSO Kd to 40 Kd; a C-terminal extremity whk:h is highly hydrophobic and isprobably involved 
In anchoring the protein to the membrane; a central, well consented regk>n, which seems to be in a coiled-coil confor- 
mation. The pattern specific for this family is based on the most consented region of the coiled coil domain. 
Consensus pattem: [RQ]-x(3)-[LlVMA]-x(2)-fLIVI^-IESH]-x(2)4UVMT]-x-[DEVI^.[U\^|.x<2)-{Ll\«^ 
ILIVM]-x(3)-[LIVT].x(2)-Q- lGADEQ]-x(2)-[UVM]-[DNQT]-x-{LIVMF]-pESV]-x(2)-IUVMJ 

[ 1] Bennett M.K.. Garcla-Arraras J.E., Elferink LA.. Peterson K.. Fleming A.M., Hazuka CD., Scheller RH. Cell 74: 
863-673(1 993).f 2) Spring J.. Kato M.. BemfieW M. Trends Bkx:hem. Sci. 1 8: 124-1 25(1 993). f 3] Pettiam H R B Cell 

73:425-426(1993). 

[1494] 618. Sm protein 

The U1, U2. U4/U6, and U5 small nuclear ribonucleoprotein partk;les (snRNPs) Involved in pre-mRNA spfcing contain 
seven Sm proteins (B/B', Dl. D2, D3. E, F and G) in common, which assemble around the Sm site present in four of 
the major splk:eosomal small nuclear RNAs. These proteins contain a convnon sequence motif m two segments. Smi 
and Sm2. separated by a short variable linker. 

[1495] [1] Hermann H. Fabrizio P. Raker VA, Foulaki K. Homig H, Brahms H, Luhnnann R EMBO J 1995;14: 
2076-2088. [2] Kambach C, Waike S. Young R, Avis JM. de la Fortelle E. Raker VA. Luhnnann R, U J, Nagai K* Cell 
1999;96:375-387. 
[1496] 619. Skpl family 

[1497] [1] Stebbins CE. Kaelin WG Jr. Pavletkih NP; Science 1 999;284:455-461 . 
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[1480] [1 ] The three-dimensional structure of canavalin from jack bean (Canavalia ensiformis). Ko TR Ng JD, McPher- 

son A; Plant Physiol 1993;101:729-744. 

[1481] 608. Asparlate-semialdehyde dehydrogenase signature 

Aspartate-semialdehyde dehydrogenase ( ASD) catalyzes the second step in the common biosynthetic pathway leading 
from Asp to diamlnopimelate and Lys, to Met, and to Thr; the NADP-dependent reductive dephosphorylation of L- 
aspartyl phosphate to L-aspartate-semialdehyde. In bacteria and fungi, ASDis a protein of about 40 Kd (340 to 370 
residues) whose sequence Is not extremely well conserved [1]. A conserved cysteine residue has been implicated as 
Important for the catalytic activity [2].Thefeglon of conservation around the active site residue is too small to be used 
as signature pattern. Another more conserved region, located in the last third of the sequence, and which contains 
both a conserved cysteine as well as an histidine has been used Instead 
Consensus pattern: (LIVM]-[SADN]-x{2)-C-x-R-[LIVM]-x(4)-[GSC]-H4STA 

[ 1] Barll C. Richaud C, Foumi E.. Baranton G., Saint Girons I. J. Gen. Microbiol. 138:47-53(1992).l2] Karsten W.E.. 
Viola R.E. Biochim. Biophys. Acta 1121:234-238(1992). 

[1482] N-acetyl-gamma-glutamyl-phosphate reductase active site N-acetyl-gamma-glutamyl-phosphate reductase 
(EC 1.2.1.36) (AGPR) [1 .2] ts the enzyme that catalyzes the third step in the biosynthesis of arginlne from glutamate. 
the NADP-dependent reduction of N-acetyl-5-glutamyl phosphate into N-acetylglutamate 5-semialdehyde.ln bacteria 
it is a rrxHiofunctional protein of 35 to 38 Kd (gene argC) while In fungi rt is part of a bifunctional mitochondrial enzyme 
(gene ARG5,6. argil orarg-6) which contains a N-terminal acetylglutamate kinase (EC 2.7.2.8 ) domain and a C-temiinal 
AGPR domain. In the Escherichia coli enzyme, a cysteine has been shown to be implicated in the catalytk: activity, the 
region around this residue is well consen/ed and can be used as a signature pattern. 

Consensus pattern: [LI VMHGSA]-x-P-G-C-(FY]-(AVP]-T-(GAJ-x(3)4GTAC]-(U VMJ-x-P [C is the active site residue] 

[ 1] Ludovlce M., Martin J.F., Carrachas P., Liras R J. Bacteriol. 174:4606-461 3(1 992).[ 2] Gessert S.F., Kim J.H.. 

Nargang RE., Weiss R.L J. Bwl. Chem. 269:8189-8203(1 994). 

[1483] 609. Slalyltransf erase family. 

Number of members: 18 

[1484] 610. SpoU rRNA Methylase family 

This family of proteins probably use S-AdoMet. Number of members: 58 

(1 ] SpoU protein of Escherichia coli betongs to a new family of putative rRNA methylases. Koonin EV. Rudd KE; Nucleic 
Acids Res 1 993:21 :651 9-551 9. [2] The spoU gene of escherichia coli , the fourth gene of the spoT operon, is essential 
for tRNA (Gm18) 2 * methyltransf erase activity. Persson BC, Jager G, Gustafsson C; Nucleic Acids Res 1997-25- 
4093-4097. 

[1 485] 611 . Stathmin family signatures 

Stathmin [1] (from the Greek 'stathmos'which means relay), is an ubiquitous intracellular protein, present in a variety 
of phosphorylated forms and which serves as a relay for diverse second messenger pathways. Its expression and 
phosphorylation are regulated throughout development and in response to extracellular signals regulating cell prolrf- 
eratbn, differentiation and function. Stathmin is a highly consen/ed protein of 149 amino acid residues. Structurally, it 
consists of an N-terminai domain of about 45 residues followed by a 78 residue alpha-helical domain consisting of a 
heptad repeat coiled coll structure and a C-termlnal domain of 25 residues. Protein SCG10 is a neuron-specific, mem- 
brane-associated protein that accumulates in the growth cones of developing neurons. II is highly similar in Its sequence 
to stathmin. but differs in that it contains an additional N-terminal hydrophobic segment of 32 residues which is probably 
responsible for its Interaction with membranes. Xenopus protein XB3 is also evolutionary related to stathmin and also 
contains an additional N-terminal hydrophobic domain [2]. A consen/ed decapeptide which ends with the first three 
residues of the coiled coil domain and a second pattern that corresponds to part of the central region of the coiled coil 
have been selected as signatures for proteins of the stathmin family. 
Consensus pattern: P-(KRQJ-(KR](2)-[DEJ-x-S-L-lEG]-E- 
Consensus pattern: A-E-K-R-E-H-E-[KR|-E- 

[1] Sobel A. Trends Biochem. Sci. 16:301 -305(1 991 ).[ 2] Maucuer A.. Moreau J.. Mechali M.. Sobel A. J Biol Chem 
268:16420-16429(1993). 

(I486] 612. SUA5/yclO/yrdC family signature. The following uncharacterized proteins have been shown [1] to share 
regions of similarities: - Yeast protein SUA5. - Escherichia co« hypothetical protein yciO and H1 1 1 98. the corresponding 
Haemophilus influenzae protein. - Escherichia coli hypothetical protein yrdC and HI0656. the con^esponding Haemo- 
philus influenzae protein. - Bacillus subtilis hypothetical protein ywlC. - Mycobacterium leprae hypothetical protein In 
rfe-hemK intergenk: region. - Methanococcus jannaschil hypothetical protein MJ0062. These are proteins of from 20 
to 46 Kd which contain a number of consen/ed regions in their N-terminal section. They can be pk:ked up in the database 
by the folk>wing pattern. 

[1487] Consensus pattern: lLIVMTAJ(3)-(LIVMFYC]-(PGJ-T4DEHSTA]-x-[FY]-[GAHUVMl-[GS]- 
[1488] [ 1J Bairoch A.. Rudd K.E.. Robison K. Unpublished obseroations (1995). 
[1 489] 61 3. Sucrose synthase 
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tern. The is sporophytically controlled by Comment: multiple alleles at a single locus (S). S4ocus glycoproteins, Gom> 
ment: as well as S-receptor kinases, are in linkage with the S-alleles [1]. Number of members: 128 
[1] Evolutionary aspects of the S-related genes of the Brassica self-incompatibility system: synonymous and nonsyn- 
onymous base substitutions. Hinata K. Wlatanabe M, Yamakawa S. Satta Y, Isogai A; Genetics 1995;140:1099-1104. 
s (2) Polymorphism of the S-kx:us glycoprotein gene (SLG) and the S-bcus related gene (SLRl) in l^aphanus sativus 
L. and self-incompatible ornamental plants in the Brassk:aceae. Sakamoto K, Kusaba M. Nishio T; Mot Gen Genet 
1998:258:397-403. 

[1474] 603. (sdh cyt) Succinate dehydrogenase cytochrome b subunit signatures 

Succinate dehydrogenase (SDH) is a membrane-bound complex of two main components: a membrane-extrtnsk: com- 
70 ponent composed of an FAO-binding flavoprotetn and an iron-sulfur protein, and a hydrophobic component composed 
of a cytochrome B and a membrane anchor protein. The cytochrome b component- is a mono heme transnnembrane 
protein (1.2.3] betonging to a famPy that groups: - Cytochrome b-556 from bacterial SDH (gene sdhC). - Cytochrome 
b560 from the nrwnmalian mitochondrial SDH complex, - Cytochrome b560 subunit encoded in the mitochondnal ge- 
nome of some algae and in the plant Marchantia pofymorpha. - Cytochrome b from yeast mitochondrial SDH complex 
IS (gene SDH3 or C YB3). - Protein cyt-1 from Caenorhalxiitis.These cytochromes are proteins of about 1 30 reskjues that 
comprise threetransmembrane regkms. There are two conserved histklines whk:h may beinvolved in biruitng the heme 
group. Two signature patterns have been devek>ped that include these histidine residues. 
Consensus pattern: R-P^UVMTl-x(3)-{U\^]-x(6)-[U VMWPK]-x(4)-S-x(2)-H-R-x-{ST] [H could be a heme ligandj 
Consensus pattern: H-x(3)-[GA]-{LIVMT]-R.[HFHLIVI^F]-x-[FYWMJ-D-x-{GVAJ [H could be a heme ligandj 
20 1 1) Yu L. Wei Y-Y. Usui S.. Yu C.-A. J. Biol. Chem. 267:24508-2451 6(1 992).l 2] Abraham PR.. Mulder A.. Nfeni Riet 
J., Raue H A. Mol. Gen. Genet. 242:708-71 6(1 994).[ 

3] Lebbnc C. Boyen C„ Rfchard O., Bonnard G.. Grienenberger J.M., Kloareg B. J. Mol. Biol, 250:484-495(1995). 
[1475] 604. Seel family 

(1] The Sec1 family: a novel family of proteins involved in synaptk: transmission and general secretion. Halachmi N. 
25 Lev Z; J Neurochem 1 996;66:889-897. 
Number of members: 4Q 

[1 476] 605. Protein secE/sec61 -gamma signature 

In bacteria, the secE protein plays a role In protein export; it is one of the components - with secY and secA - of the 
preprotein translocase. In eukaryotes. the evolutk>nary related protein sec6l -gamma playsa role in protein transkxation 

30 through the endoplasmc reticulum; it is part of a trimeric complex that also consist of sec61 -alpha arrd beta (1J. Both 
secE and sec61 -gamma are small proteins of about 60 to 90 amino acids that contain a single transmembrane region 
at their C-terminal extremity (Escherichia colisecE is an exception, in that it possess an extra N-terminal segment of 
60residues that contains two additional transmembrane domains).The sequence of secE/sec61 -gamma is not extreme- 
ly well consented, however it is possble to derive a signature pattern centered on a conserved proline located 10 

3S residues before the beginning of the transmembrane domain. 

Consensus pattern: [LI\^FY]-x(2)-[DENCX3A]-x(4)-[UVMFTA]-x-{KRV]-x(2)-[KW]-P-x(3)-[SEQ]-x(7)-lU\ni-IL 
(LIVFGASTJ 

( 1] Hartmann E.. Sommer T, Prehn S., Goeriich D., Jentsch S., Rapoport TA. Nature 367:654-657(1994). 

[1 477] 606. 1 1 -S plant seed storage prc^eins signature 

^ Plant seed storage proteins, whose principal function appears to be the major nitrogen source for the developing plant, 

can be classified, on the basis of their stmcture, into different families. 1 1 -S are non-glycosyiated proteins whrch form 

hexamerk; structures [1,2J. Each of the subunits in the hexamer is itself conrposed of an ackiic and a basic chain 

derived from a single precursor and linked by a disulfide bond. This structure is shown in the following representatkm. 
+ ^ II 

^ xxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxNGxCxxxxxxxxxxxxxxxxxxxxxxx < — -Ackirc-subunit — 



Basc-subunil > < ^About-480-to-500-resWues ^>'C*: consented cysteine involved in a di- 
sulfide bond-***: positk)n of the pattern. Proteins that bebng to the 11 -S family are: pea and broad bean legumins. rape 
cruciferin, rice glutelins, cotton beta-globulins, soybean glycinlns, pumpkin 11 -S gtobulin, oat gtobulin, sunflower heli- 
anthinin G3. etc. The region that includes the conserved cleavage site between the ackJic and t)asic subunits (Asn- 
^ Gly) and a proximal cysteine restiue whk:h is ffivolved in the interchain disulfide bond have been used as a signature 
pattern for this family of proteins. 

Consensus pattern: N-G-x^DE]{2)-x-[UVMF]-C^ST^x(11.12)-[PAGJ-D [C is involved in a disuffide bond 
[ 1] HayashiM., MoriK. NishimuraM., AkazawaT, Hara-Nishimura I. Eur. J. Bkx:hem. 1 72:627-632(1 988).1 21Shotwell 
M.A., Afonso C, Davies E., Chesnut RS., Larkins B.A. Plant Physiol. 87:698-704(1988). 
ss [1478] 607. 7S seed storage protein 

[1479] 7S globulin is one of the main storage proteins of most angk)sperms and gymnospenms. The 7S storage 
proteins are homotrimers. 
Number of members: 67 
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(PP2A) is also an enzyme of broad specificity. PP2A is a trimeric enzyme !hat consist of a core composed of a catalytic 
subunit associated with a 65 Kd regulatory subunit and a third variable subunit. In mammals, there are two closely 
related isoforms of the catalytic subunit of PP2A: PP2A-alpha and PP2A-beta. encoded by separate genes. - Protein 
phosphatase-2B (PP2B or calcineurin), a calcium-dependent enzyme whose activity is stimulated by calmodulin. It is 
composed of two subunits: the catalytic A-subunit and the calcium-binding B-subunit. The specificity of PP2B is re- 
strictedln addition to the above-mentioned enzymes, some additional serine/lhreoninespecific protein phosphatases 
have been characterized and are listed below. - Mammalian phosphatase-X (PP-X). and Drosophila phosphatase-V 
(PP-V) which are closely related but yet distinct from PP2A. - Yeast phosphatase PPH3. which is similar to PP2A. but 
with different enzymatic properties. - Drosophila phosphatase-Y (PP-Y). and yeast phosphatases Z1 and 22 (genes 
PPZ1 and PPZ2) which are closely related but yet distinct from PP1. - Drosophila retinal degeneration protein C (gene 
rdgC). a calcium-binding phosphatase required to prevent light-Induced retinal degeneration. - Phages Lambda and 
Phi-80 ORF-221 which have been shown to have phosphatase activity and are related to mammalian PP's. The best 
consented regions in these proteins is a perfectly consented pentapeptide that can be used as a signature pattern 
Consensus pattern: [LIVMJ-R-G-N-H-E- 

[ 1] Cohen P Anna. Rev. Biochem. 58:453-508(1 989).[ 2) Cohen P. Cohen PTW. J. Biol. Chem. 264:21435-21438 
(1989).[ 3] Cohen PT.W.. Brewis N.D.. Hughes V.. Mann D.J. FEBS Lett. 268:355-359(1990). 
[1 4701 600. Translation initiation factor SU1 1 signature 

In budding yeast (Saccharomyces cerevisiae). SUI1 is a translatfon initiation factor that functions in concert with elF- 
2 and the initiator tRNA-Met in directing the ribosome to the proper start site of translation f1]. SUI1 is a protein of 108 
residues. Close homologs of SUI1 have been found [2] in mammals, insects and plants. SUI1 is also evolutionary 
related to hypothetical proteins from Escherichia coli (yciH). Haemophilus influenzae (HI1225) and Methanococcus 
vannielii. A conserved region in the C-terminal section has been selected as a signature pattem. 
Consensus pattem: [LIVM]-[EQ]-[LIVM]-Q-G-[DEN]-IKHQ]-[KRV) 

[ 1] Yoon H.. Donahue T.F Mol. Cell. Biol. 12:248-260(1992).! 2J Fields C.A.. Adams M.D. Biochem, Biophys Res 
Commun. 198:288-291(1994). 

[1471] 601 . (S T dehydratase) Serin eAhreonine dehydratases pyrldoxal-phosphate attachment site 
Serine and threonine dehydratases [1.2] are functionally and structurally related pyridoxahphosphate dependent en- 
zymes: - L-serine dehydratase (EC 4.2.1.13 ^ and D-serine dehydratase (EC 4.2.1.14 ^ catalyze the dehydratation of L- 
serine (respectively D-serine) into ammonia and pyoivate. - Threonine dehydratase (EC 4.2.1.16 ) (TDH) catalyzes the 
dehydratation of threonine into alpha-ketobutarate and ammonia. In Escherichia coli and other microorganisms, two 
classes of TDH are known to exist. One is involved in the bk)synthesis of isoleucine, the other in hydroxamino acid 
catabolism.Threonine synthase (EC 4.2.99.2 ) is also a pyridoxal-phosphate enzyme, it catalyzes the transformation 
of homoserine-phosphate into threonine. It has been shown [3] that threonine synthase is distantly related to the serine/ 
threonine dehydratases. In all these enzymes, the pyridoxal-phosphate group is attached to a lysine residue. The 
sequence around this residue is sufficiently conserved to allow the derivation of a pattem specific to serineAhreonine 
dehydratases and threonine synthases. 

Consensus pattem: [DESHl-x(4,5)-[STVG]-x-[AS]-IFYi]-K.lDLIFSAl-[RVMF]-[GAHLIVMGAI [The K is the pyrkioxal-P 
attachment site] 

[ 1] Ogawa H., Gomi T. Konishi K., Date T. Naakashima H.. Nose K., Matsuda Y., Peraino C, Pitot H C Fujioka M 
J. Biol. Chem. 264:1 581 8-15823(1 989). [ 2] Datta P. Goss T.J., Omnaas J.R.. Patil RV. Proc. Natl Acad 'sci USA 
84:393-397(1987).[ 3] Parsot C. EMBO J. 5:301 3-3019(1 986).[ 4] Grabowski R.. Hofmeister A.E.M.. Buckel W Trends 
Biochem. Sci. 18:297-300(1993). 

[1472] Cysteine synthase/cystathk>nine beta-synthase P-phosphate attachment site 

Cysteine synthase (CSase) is the pyridoxal-phosphate dependent enzyme responsible [1) for the formation of cysteine 
from O-acetyl-serine and hydrogen sulfide with the concomitant release of acetic acid. In bacteria suchas Escherichia 
coli. two fonms of the enzyme are known (genes cysK and cysM).ln plants there are also two forms, one tocated in the 
cytoplasm and the otherin chtoroplasls.Cystathkjnine beta-synthase [2] catalyzes the first irreversiblestep in homo- 
cysteine transulfuraton; the conjugation of homocysteine andserine forming cystathbnine. Like Csase it is a pyrfctoxal- 
phosphate dependent enzyme. The two types of enzymes are evolutionary related. The pyridoxal-phosphategroup of 
CSases has been shown to be attached to a lysine residue which is located in the N-terminal sectkjn of these enzymes- 
the sequence around this residue is highly conserved and can be used as a signature pattern to detect this class of 
enzymes. / 

Consensus pattem: K-x-E-x(3)-(PAJ-[STAGC]-x-S-|IVAP]-K-x-R-x-[STAG).x(2)-[LIVM] [The 2nd K is the pyridoxal-P 
attachment site 

1 1] Saito K.. Kurosawa M.. MurakoshI I. FEBS Lett. 328:111-114(1993).[ 2] Swaroop M., Bradley K., Ohura T, Tahara 
T. Roper M.D.. Rosenberg LE., Kraus J.P J. BbL Chem. 267:11456-11461(1992). 
[1 473] 602. S locus glycop 

S-locus glycoprotein family. In Brassicaceae. self-incompatible plants have a self/non-self Comment: recognition sys- 
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PresentJnallthreedorrHinsofceDular life. Four copies in thetf^^ These however aooear 

to lack the actwe srte residues of Staphylococcal nuclease. Positions 14 {Asp-21) 34 (Ito^^iA^^A?^^ 

'"''^'^ toietio^i s^t^^e^Ltg aid 

nll2r'"l<^ ''ipBv!i^' |2J Canebau. I. Momon JP; Biochem J 1997;321:125-132. 

[1466] 596. SPRY domainA 

SPRY Domain is named from SPIa and me RYanodine Receptor. Domain of unknown function. Distant homologues 
are domains n Comment: butyrophilinAnarenostrin/pyrin homotogues. nomoiogues 
II] Ponting C. SchulU J. Bork P; Trends Biochem Sci 1997;22:193-194. 
[1467] 597. (SOS PSY)Squalene and phytoene synthases signatures 

Two dinerent poiyisoprene synthases have been shown [1 ,2.3) to share a number of regions of sequence similarities 
^Squatene syrithase (EC 2JJJ1) (famesyWiphosphate famesyltransferase) (SOS). Ich catl^erme c^^^ 

pathway The reaction earned out by SQS is catalyzed in two separate steps: the first is a head4o^,ead con3^tton 

b!^ Pn^ri • "^^^ ^ ^"'^^•^s. In yeast « is encoded by the ERG9 gene, in mammals 

by the FDFT1 gene. SQS seems to be membrane4>ound. - Phytoene synthase (EC 2.5.1 -) (PSY) whfch catalvzes 

T "^r"" °' ^'Pt^-phate (GGPpVinto phUene. It ii me s;^rs.^Tm: 

b osynthesB of carotenoids from isopentenyl diphosphate. The reaction carried out by PSY is catalyzed in two seoarate 
t^!:;^'"' V '''^'^'^^ condensation of the two molecules of GGPP fo foL preph^Hph^l^nS 
Z^^T 'f Y^."^^"'' '° IS found in all organisms that synmiize carotenoS? plants 

and photosynthetc bacter« as well as some non- photosynthetic bacteria and fungi. In bacteria PSY is encodWby 

and PSY share a number of funct«nal similarities which are also reflected at the level of th^r primanr slructure In 

^T^dZV^^^T ^'^ "^^^ ^^Y' be involved in substrate binding and/ 

ZTt^^^ ^echansm. Signature patterns have been devetoped for the second and third consented regions they 
are kxalized in the central part of these enzymes. loai""*, iney 

Consensus pattern: Y4CSAMl-x(2HVSGl-A-{GSAHLIVAT]-[IV]-G-x(2)4LMSCl- x(2)-fUVl 

Consensus pattern: ILIVMhG-x(3)0-x(2.3)-N-[IF].x-R.DHLIVMFY]-x(2HDE]- x(4.7)-R-x- FYl-x-P- 

L.1 t'^J:' ^^^^ ^^"^ 136:185-192(1993).( 2] Robinson G.W.. Tsay YH.. Kienzle B K Smith- 

KuTX, sthem^'Bi^H • ^=^2706-2727(1993).! 3J Roemer S.. Hugueney R. Bouvier F.. cL« B.. 

Kuntz M. Biochem. Biophys. Res. Commun. 196:1414-1421(1993). 

[1468] 598. SRP54-type proteins GTP-binding domain signature 

The signal recognition paitide (SRP) is an oligomeric complex that mediates targeting and insertion of the siqnal 
sequence of exported proteins into the membrane of the endoplasmic reticulum. SRP <insists of a RNA a^i st 
P|^e« subun«s. One of these subunits. me 54 Kd protein (SRP54). is a GTP-binding protein that interact wTh the 

XgSZT^T ' '^-'^'"'"^ ^ ^ SRP54 include me 

site (G-doma^) and are evolutionary related to similar domains in other proteins whk4i are listed betow [1 1 - EscherichS 

col. and Bacu us subtilis ffh protein (P48). a protein which seems to be me prokaryotic count^^i^ SwS^fT^ 

r"'"^^ GTP-binding protein whfch ensures, in conjunctfon with SRP. me correct teiSig 
eL^STorT^"^ '* r *° ^"^P'^^^ 'nembrane. The G^Jomain is located at me cJeS 
ext emrty of me proteur - Bactenal ftsY protein, a protein which is believed to play a similar lole to mat of me SodZ 
prote^ ^ eukaiyotes^ The G^Jomain is located at the C-terminal extremity of f^e Jroteh. - The pSZ^Jt^eS 

This protein is also believed to be a docking protein. The G<Jomain is also at me C- temiinus - Bacterial flaaellar 
GtJCJ^o TT'-^ "est consented regions in those domains are me sequence m<." tiSte pL" ? he 

H ^ ^^""^ "P"^*"^ '° •^^^ "^^y not used as a signato^patlem 

Instead, a consented region kjcated at me C-terminal end of me domain was selected lurepanem. 
C<«sensus pattem: P4IJVMl-x-[FYLHLIVMATl-IGShx-IGS]-p^^^ ^ D Wise J A. 

Nucleic Acids Res. 22:1933-1947(1994). •. "ise 

[1469] 599. (STphosphatase) SerineAhreonine specifte protein phosphatases signature 

Semarmreonine specffic protein phosphatases (EC 3.1.316) (PP) (1.2.3] are enzymes that catalyze the removal of a 
phosphate group attached to a serine or evolutionary related. - Protein phosphat^se-l (PPi) is^ ^^J^it^ 
^rficrty. It « inhto»ed by two mermostab.e prote^s. inhibitor-1 and -l In Lnmals. Li^l^S^iylS 

7i'^ZT ^- r T "T"^' """^ ^ '^^^ the actton of me nimA kinase. In yeast Ppl 
1 (gene SIT4) ,s uivolved « dephosphonrlating me large subunit of RNA polymerase II. - Protein phospJSse-^ 
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gamma-1 and -2. - Mammalian phosphatidyl inositol 3-kinase regulatory p85 subunit, - Mammalian Ras GTPase- 
activatlng protein (GAP). - Adaptor proteins mediating binding of guanine nucleotide exchange factors to growth factor 
receptors: vertebrate GRB2, Caenorhabdilis elegans sem-5 and Drosophlla DRK. All of which have two SHSdomains. 

- Mammalian Vav oncoprotein, a guanine nucleotide exchange factor of the CDC24 family. - Some guanine-nucleotide 
releasing factors of the CDC25 family: yeast CDC25. yeast SCD25, fission yeast ste6. - MAGUK proteins. These 
proteins consist of at least three types of domains: one or more copies of the DHR domain, a SH3 domain and a C- 
temiinal guanylate kinase domain. Members of this family are: Drosophila lethal(l)discs large-1 tumor suppressor 
protein (gene DIgl ), mammalian tight junction protein Z01 . vertebrate erythrocyte membrane protein p55. Caenorhab- 
ditis elegans protein lln-2. rat protein CASK and mammalian synaptic proteins SAP90/PSD-95, CHAPSYN-110/PSD- 
93. SAP97/DLG1 and SAP102. - Miscellanous proteins interacting with vertebrate receptor protein tyrosine kinases 
mammalian cytoplasmic protein Nek (3 copies), oncoprotein Crk (2 copies). - Chfcken Src substrate p80/85 protein 
(cortactin) and the similar human hemopoietic lineage cell specific protein Hsi . - Mammalian dihydrourldine-sensitlve 
L-type calcium channel beta (regulatory) subunit including the related human myasthenic syndrome antigen B (MSYB) 

- Mammalian neutrophil cytosolic activators of NADPH oxidase: p47 (NCF-1 ). p67 (NCF-2). and a potential homolog 
from Caenorhabditis elegans (B0303.7). NCF-1 and -2 have two copies of the SH3 domain, while 80303.7 has four. - 
Some myosin heavy chains from amoebae, slime molds and yeast (gene MY03). - Vertebrate and Drosophila spectrin 
and fodrin alpha^hatn. - Human amphlphysin. - Yeast actin-binding protein ABP1. - Yeast actin-binding protein SLA1 
(3 copies). - Yeast protein BEM1 and the fission yeast honrtolog scd2 (or ral3) (2 copies). - Yeast BEMl -binding proteins 
BOI2 (BEB1) and BOB1 (BOI1). - Yeast fusion protein FUS1. - Yeast protein RSV167. - Yeast protein SSU81 - Yeast 
hypothetical proteins YAR014c (1 copy). YFR024c (1 copy). YHL002w (1 copy). YHR016c (1 copy). YJL020G (1 copy). 
YHR114W (2 copies) and the fission yeast homolog SpAC12C2.05c. - Caenorhabditis elegans hypothetteal proteins 
F42H10.3. The profile devek>ped to detect SH3 domains Is based on a structural alignment consisting of 5 gap^ree 
blocks and 4 linker regions totaling 62 match positions. 

[ 1] Mayer B.J., Hamaguchi M., Hanafusa H. Nature 332:272-275(1 988).[ 2] Musacchio A., Gibson T. Lehto VP Sar- 
aste M. FEBS Lett. 307:55-61 (1992).[ 3] Pawson T. Schlessinger J. Curr. Bbl. 3:434-442(1993) ( 4]'Mayer B J ' Bal- 
timore D. Trends Cell Biol. 3:8-1 3(1 993).[ 5) Pawson T Nature 373:573-580(1 995). [6] Kuriyan J.. Cowbum D Curr 
Opin. Struct. Biol. 3:828-837(1 993).[ 7] Morton C.J.. Campbell I.D. Curr. Biol. 4:615-617(1994). 
[1457] 590. Serine hydroxymethyltransf erase pyridoxal-phosphate attachment site (SHMT) 
Serine hydroxymethyftransferase (EC 2.1.2.1) (SHMT) [1] catalyzes the transfer of the hydroxymethyl group of serine 
30 to tetrahydrofolate to fomn 5,10-methylenetetrahydrofolate and glycine. In vertebrates, it exists in acytoplasmic and a 
mitochondrial form whereas only one fomi is found in prokaryotes. Serine hydroxymethyftransferase is a pyridoxal- 
phosphate containing enzyme. The pyridoxal-P group is attached to a lysine residue around which the sequence is 
highly conserved in all iorms of the enzyme. 

Consensus pattern: [DEHHUVMFYl-x-ISTMV]-IGST14STl(2).H-K.[STl-[LF].x-G4PACl-[RQ]-[GSAH^^^ [K Is the py- 
35 ridoxal-P attachment site] n Ji yy 

I II Usha a. Savithri H.S., Rao N.A. Biochim. Biophys. Acta 1204:75-83(1994) 
[1458] 591. SIS domain 

SIS (Sugar ISomerase) domains are found in many phosphosugar isomerases and phosphosugar binding proteins 
[1) Teplyakov A. Obmolova G. Badet-Denisot MA. Badet B. Polikarpov I; Structure 1998*6' 1047-1055 
^ [1459] 592. (SKI) Shikimate kinase signature 

Shikimate kinase (EC 2.7.1.71) catalyzes the fifth step in the bfosynthesis from chorismate of the aromatic amino acids 
(the shikimate pathway) inbaderla (gene aroK or aroL). plants and in fungi (where it Is part of a multifunctional enzyme 
which catalyzes five consecutive steps in this pathway). Shikimate kinase is a small protein of about 200 residues. A 
consen/ed region that contains a run of three glycines has been selected as a signature pattern 
Consensus pattern: lKR].x(2).E-x(3).[LIVMFl-x(8.12)-ILIVMF)(2)-lSA].x-G(3)- x-fLIVMF]. Proteins belonging to this 
family also contain a copy of the ATP/GTP- binding motif 'A' (P-kxDp). 
[1 460] 593. SN AP-25 family 

[1461] SN AP-25 (synaptosome-associated protein 25 kDa) proteins are components of SNARE complexes. Mem- 
bers of this family contain a cluster of cysteine residues that can be palmrtoylated for membrane attachment [2]. 
[1462] (1]Brennwald P, Kearns B. Champion K, Keranen S. Bankaitis V NovlckP; Cell 1994;79:245-258. [2) Risinger 
C. Bksmqvist AG, Lundell I. Lambertsson A. Nassel D, Pieribone VA, Brodin L. Larhammar D- J Biol Chem 1993-268- 
24408-24414. 

[1 463] 594. SNF2 and others N-terminal domain 

[1464] This domain is found in proteins involved in a variety of processes including transcription regulation (e.g . 
SNF2. STH1, brahma. MOT1). DNA repair (e.g.. ERCC6. RAD16. RAD5). DNA recombination (e.g., RAD54). and 
chromatin unwinding (e.g.. ISWI) as well as a variety of other proteins with titUe functk>nal Information (e.g.. kxJestar, 
ETLI ). 

[1465] 595. Staphytococcal nuclease homologues (Snase) 
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the second is an almost perfectly conseived glyclne-rich nonapeptide. 

Consensus pattern: G-A-<3-CK3-G-x(3)-G-{FYHhSequences known to belong to this class detected by the oattem- 
Consensus pattern: G-|GAJ-G-lASCJ-F-S-x-K-(DE] (wnem. 
( 1) Horikawa S.. Sasuga J.. Shimizu K., Ozasa H.. Tsukada K. J. Biol Chem. 265:13683-I3686n990i 
[1452] 585. SI RNA binding domain 

The SI domain occurs in a wide lange of RNAComment: associated proteins. It is stmcturally similarComment to coU 
shock protein which binds nuctek: acids.Comment: The SI domain has an OB-Joki structure. 
11 J Bycroft M, Hubbard TJ, Proctor M, Freund SM. Murzin AG; Cell 1 997;88:235-242. 
[1453] 586. SAI CAR synthetase signatures 

PhosphoribosylaminoimkJazole-succinocarboxamide synthase (EC R3JJ) (SAICARsynthetase) catalyzes the sev- 
enth step in the de novo purine bnsynthetic pathway; the ATP-dependent converswn ol 5'-phosphoribosy l-5-aminoim- 
^ole-4-caiboxylte acid and aspartte acid to SAICAR [l). In bacteria (gene purC).fungi (gene ADE1) and plants 
SAICAR synthetase is a monofunctlonal protein;in higher vertebrates it is the N-temiinal domain of a bifunctionat en- 
zyme that also catalyze phosphoribosylaminolmidazole caiboxylase (AIRC) activity. Two consented regkxis in the 
central section of this enzyme have been selected as signature patterns for SAICAR synthetase 
Consensus pattern; lLI\^FK2)-P^UVMJ-E-x-[UVI^IVMCA]-R-x(3HTAJ-G-S- 
Consensus paMem: (LIVMHUVMA>C>-x-K-fUVMFY]-E-F-G 
( IJ ZaIkIn H.. Dixon J.E. Prog. Nuciek; Acid Res. Mol. Bksl. 42:259-287(1 992). 
[1454] 587. (SCP) Extracellular proteins SCP/rpx-1/Ag5/PR-1/Sc7 signatures 

l^^JSn^f^rpf ""f ' T^"^ eukanrotes have been found to be evoluUonaiy related: - Rodent spemw^ating 
glycoprotein (SCP). also known as ackJfc epkfidymal glycoprotein (AEG) . This protein is thought to t>e involved in 
sperm maturation [IJ. It is a protein of about 220 resklues and probably contains eight disulfide bonds. - Mammalian 

^T? 1^""^ '^^^ *° ^^'^ ■ Mani"«'ian QKoma pathogenesis-related protein 

(GhPR). - Lizard hetothermine. a toxin that blocks ryanodine receptors. - Venom allergen 5 (Ag5) from vespti wasos 
and venom allergen 3 (Ag3) from fire ants. These proteins are potent allergens and are the main cause of allergic 
reactions to stings from insects of the hymenopteia family [3J. Ag5/3 are proteins of about 200 residues and contain 
four disulfide bonds. - Plant pathogenesis proteins of the PR-1 family [4]. These proteins are synthesized during path- 
^en jnf ecton or other stress^Blated responses. They are proteins of about 1 30 to 140 resklues and probably ojntaln 
three disulMe bonds^Proteins Sc7 and Sc14 from the basklomycete fungus Schizophyllum commune. These extra- 
cellular proteins are kwsely associated with fnirt body hyphal walls [5J. Sc7/14 are proteins of about 180 resklues and 

?!S^c*So7^!'Ifvl!S!rt; " '^^"^ ^'^'^ P'^^^^ ''<^ *'9 hookwom,. - Yeast hypothetfcal proteins 
YJL07ec. YJL079C and YKR01 3w.The exact function of these proteins is not yet known. Two conserved regfons kjcated 
in their C-terminal half have been selected as signature patterns. The second signature contains a cysteine whfeh is 
known to be involved In a disulfkle bond in AgS. wmcn e 

Consensus pattern: [GDER)-H-(FYWH>T-CKUVMl(2)-W-x(2)-[STN] 

suffld^3r"^'"' ''■'^"""^'^^■^•f^°^SJ-Y-''-P'^^'^^<54-N-tLI\^FYWON^ [C is involved in a di- 
riiJi^f N Kasahara M. Mol. Cell. Endocrinol. 89:25-32(1 992).[ 2J Kasahara M.. Gutknecht J., Brew K.. Spurr N.. 

^^aS^^^n,^^^ n^^^ """^"^ ^ D-"- T.P. J. Immunol 

150.2823-^30(1993).[ 4] Doton D.C., Cutt J.R. Klessig D.F. EMBO J. 10:1317-1324(1991).( 5] S<^uren F.H J . As- 

geirsdottir S.A.. Kothe E.M.. Scheer J.IWI. J.. Wessels J.G.H. J. Gen. Mksrobbl. 1 39 2083-2090(1 9931 
[1455] 588. SET domain 

SET domains appear to be protein-protein interactionComment: domains. It has been demonstrated that SET do- 
mamsComment: mediate interactk^ns with a family of proteins thatComment: display similarity with duahspecifnity 
phosphatasesComment: (dsPTPases) [2J. r / y » • uucrBpebuiciiy 

{IJTripoubsN. LaJeunesseD. GikJeaJ.SheamA;Genetfcs 1996;143:913-928.l2]CuiX.DeVivol.SlanyR. Miyamoto 
A. Firestein R. Cleary, ML; Nat Genet 199B;18:331-337. ■/ ri. iwiydmoio 

[1 45q 589. Src homotogy 3 (SH3) domain profile 

The Sro homotogy 3 (SH3) domain is a small protein domain of about 60 amino^cW resklues first klentified as a 
amseived sequence m the non^atalytfc part of several cytoplasmic protein tyrosine kinases (e g Sre Abl Lck) (11 
Since then, it has been found in a great variety of other mtracelhjlar or membrane-associated proteins [2 3 4 51 The 

Lr!l2^r; S^.'^Tf "^"^ Of five or Six betanstrands arranged as tv£ tightly U Janti- 

para tel beta sheets. The linker regions may contain short helices (6J.The function of the SH3 domain is not wen un- 

'^"'^^ '* ^ ^ ""^^ assembly of specify protein complexes via binding to proline-rich 

peptides R]. n general SH3 domains are found as single copies in a given protein, but there is a significant number of 
pr«e|n wrthh«> SH3 domains and a few with 3 or 4 copies. So far, SH3 domains have been identL in the f^iS 
r„t! Q ' '"^^''^brateandretroviralcytoplasmk: (non-receptor) protein tyrosine kinases. In partknilar 

in the Src. Abl, Bkt. Csk and 2AP70 families of kinases. - Mammalian phosphatkfyfinostto^specifk: phosphohpase C- 
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Consensus pattern: [RKl-G-{EDRKHPCG)-[AGSC!]-[FYJ-[LIVA].x-[FYLMI In most cases the residue in position 3 of 
the pattern is either Tyr or Phe. 

[ 1 J Bandziulis R.J.. Swanson M.S.. Dreyfuss G. Genes Dev. 3;431-437(1989).[ 2) Dreyfuss G Swanson M S Pinol- 
Ronna S. Trends Biochem. Sci. 13:86-91(1 988). [ 3] Mitourn S.C., Hershey J.W.B.. Davies M.V.. Kelleher K Kaufman 
5 R.J. EMBO J. 9:2783-2790(1 990).( 4] Szabo A.. Dalmau J.. Manley G.. Rosenfeld M., Wong E. Henson J Posner J 
B.. Fumeaux HM. Cell 67:325-333n991).[ 5J Rebagliati M. Cell 58:231 -232 flQflQ) ffi) Li Y, Sugiura M. EMBO J. 9: 
3059-3066(1 990).[7J Ayane M., Preuss U., Koehler G., Nielsen PJ. Nucleic Acids Res. 1 9: 1273-1 278(1 991 ).( 8] 
Kawakami A., Tian Q., Duan X.. Streuli M., Schlossman S.F., Anderson R Proc. Natl. Acad. Sci. U.S.A. 89:8681-8685 
(1992).[ 9] f^invielle-Sebastia L, Winsor B., Bonneaud N., Lacroute F. Mol. Cell. Biol. 11:3075-3087(1991) 
[1443] 581. Rubredoxin signature 

[1444] Rubredoxins [1] are small electron-transfer prokaryolic proteins. They contain an iron atom which is ligated 
by four cysteine residues. Rubredoxins are, in some cases, functionally interchangeable with ferredoxins. 
[1445] A consen/ed region that includes two of the cysteine residues that bind the iron atom has been selected as 
a pattern for these proteins. 

[1446] Consensus pattern: [LI VM]-x(3)-W-x-C-P-x-C-(AGD] [The two C's bind the iron atomj 
In Pseudomonas oleovorans rubredoxin 2 (gene alkG) [2], this pattern is found twice because alkG has two rubredoxin 
domains. 

Rubrerythrin [3], a protein with inorganic pyrophosphatase activity from Desulfovibrio vulgaris possesses a C-terminal 
rubredoxin-like domain, but this domain is too divergent to be detected by the above pattern. 

[1447] [ 1] Berg J.M.. Holm R.H.(ln) Iron-sulfur proteins, Spiro TG.. Ed., pp1-66, Wiley, New- York. (1982). [ 2J Kok 
f^.. Oldenhuis R.. der Linden M.RG.. Meulenberg C.H.C.. KIngma J., Witholt B.. J. Biol. Chem. 264:5442-5451(1989). 
[3] van Beeumen J.J.. van Driessche G.. Liu l^.-Y. Le Gall J.. J. Biol. Chem. 266:20645-20653(1991). 
[1 448] 582. (rvp) E ukaryotic and viral aspartyl proteases active site 

Aspartyl proteases, also known as acid proteases. (EC 3.4.23.-) are a widely distributed family of proteolytic enzymes 
(1 .2.3] known to exist invertebrates, fungi, plants, retroviruses and some plant viruses. Aspartate proteases of eukary- 
otes are monomeric enzymes which consist of two domains. Each domain contains an active site centered on a catalytic 
aspartyl residue. The two domains most probably evolved from the duplication of an ancestral gene encoding a pri- 
mordial domain. Currently known eukaryotfc aspartyl proteases are: - Vertebrate gastric pepsins A and C (also known 
as gastrkjsin). - Vertebrate chymosin (rennin). involved In digestion and used for making cheese. - Vertebrate lysosomal 
cathepsins D (EC 34.23.5) and E (EC 3.4.2334 ). - l^mmalian renin (EC 34.2315) whose f unctkxi is to generate 
angiotensin I from angiotensinogen in the plasma. - Fungal proteases such as aspergiitopepsin A (EC 3.4.2318 ). 
candidapepsin (EC 34.2324). mucoropepsin (EC 34.23.23) (mucor rennin). endothiapepsin (EC 3.4.23.22 ). polypo^ 
ropepsin (EC 34.2329). and rhizopuspepsin (EC 34.23.21 ). - Yeast saccharopepsin (EC 34.23.25 ) (oroteinase A) 
(gene PEP4). PEP4 is implk:aled in posttranslatbnal regulation of vacuolar hydrolases. - Yeast banier pepsin (EC 
3.4.23.35) (gene BAR1 ); a protease that cleaves alpha-factor and thus acts as an antagonist of the mating pheromone. 
- Fissfon yeast sxal whrch is involved in degrading or processing the mating pheromones. Most retroviruses and some 
plant viruses, such as badnaviruses. encode for anaspartyl protease which is an homodimer of a chain of about 95 to 
1 25 amino acids. In most retroviruses, the protease is encoded as a segment of a polyprotein whk:h is cleaved during 
the maturation process of the virus. It is generally part of the pol polyprotein and. more rarely, of the gagpolyprotein. 
Conservation of the sequence around the two aspartates of eukaryotic aspartyl proteases and around the single active 
site of the viral proteases allows us to devetop a single signature pattern for both groups of protease 
Consensus pattem: IUVMFGAC]-[UVI^ADN)-[LIVFSA]-D-[ST]-G-[STAV]-[STAPDENQ]- x-[LIVMFSTNC]-x-{LIVMF- 
GTA] [D is the active site residue] - [ 1] Foltmann B. Essays Biochem. 17:52-84(1 981 ).[ 2] Davies D.R Annu Rev 
Biophys. Chem. 19:1 89-21 5(1 990).[ 3] Rao J.K.M.. Erickson J.W.. Wlodawer A. Biochemistry 30:4663-4671(1991) [ 4] 
Rawlings N.D.. Barrett A.J. Meth. Enzymol. 248:105-120(1995). 
[1449] 583. (nrt) Reverse transcriptase (RNA-dependent DNA polymerase) 

A reverse transcriptase gene is usually indk:ative of a mobile element such as a retrotransposon or retrovirus. Reverse 
transcriptases occur in a variety of mobile elements, including retrotransposons. retroviruses, group II introns. bacterial 
msDNAs. hepadnaviruses, and caulimoviruses. Number of members: 1233 

[1450] [1] Medline: 91006031. Origin and evolution of retroelements based upon their reverse transcriptase sequenc- 
es. Xiong Y. Eickbush TH; Ef^BO J 1990;9:3353-3362. 
[1451] 584. (S-AdoMet synt) S-adenosylmethbnbie synthetase signatures 

S-adenosylmethionine synthetase (EC 2.5.1.6 ) is the enzyme that catalyzes theformation of S-adenosylmethionine 
(AdoMet) from methfonlne and ATP [1]. AdoMet is an important methyl donor for transmethylatbn and is also the 
propylamino donor in polyamine bk)synthesis. In bacteria there is a single isofonn of AdoMet synthetase (gene metK). 
there are two in budding yeast (genes SAMl and SAM2) and in mammals while in plants there is generally a muttigene 
family.The sequence of AdoMet synthetase is highly conserved throughout isozymes and species. Two signature pat- 
terns have been selected for this type of enzyme; the first is a hexapeptkie which seems to be involved in ATP-binding; 
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structurally and functionally related 30 Kdglycoproteins [1 J that cleave the 3*-5' intern ucleotide linkage of RNA via a 
nucleotide 2',3*-cyclic phosphate intermediates (EC 3.1.27.1 ).A number of other RNAses have been found to be evo- 
lutionary related to these fungal enzymes: - Self-incompatibility [2] in flowering plants is often controlled by a sbigle 
gene (S-gene) that has several alleles. This gene prevents fertilization by selfi)ol!en or by pollen bearing either of the 
two S- alleles expressed in the style. The seK-incompalibitity glycoprotein from several higher plants of the solanaceae 
family has been shown [2.3) to be a ribonuclease. - Phosphate-starvation induced RNAses LE and LX from tomato 
[4]. These two enzymes are probably involved in a phosphate-starvation rescue system, - Escherichia coli periplasmic 
RNAse I (EC 3.1.27.6) (gene rna) [5]. - Aeromonas hydrophila periplasmic RNAse. - Haemophilus influenzae hypo- 
thetical protein HI0526.Two histidines residues have been shown [6.7J to be involved in the catalytic mechanism of 
RNase T2 and Rh. These residues and the region around them arehighly conserved in all the sequence described 
above. Two signature patterns have been developed, one for each of the two active-site histidines. The second pattern 
also contains a cysteine which is known to be involved in a disulfide bond. 
Consensus pattem: [FYWL]-x-(LIVM]-H-G-L-W-P [H is an active site residue] 

Consensus pattem: [LIVMF]-x(2HHDGTYl-lEQHFYW]-x-{KRl-H.G.x-C [H is an active site residue] [C is involved in 
a disulfide bond] 

[ 1 J V\tatanabe K. Naitoh A., Suyama Y. InokuchI N., Shimada H.. Koyama T. Ohgi K.. trie M. J. Biochem. 108:303-310 
(1990).( 2J Haring V.. Gray J.E.» McClure B.A.. Anderson M.A.. Clarke A.E, Science 250:937-941 (1990).[ 3] McClure 
B.A., Haring V, Ebert PR,. Anderson M.A., Simpson RJ.. Sakiyan^ R, Clarke A.E. Nature 342:95957(1 989). [ 4] Lo- 
effler A„ Glund K., Irie M. Eur. J. Biochem. 214:627-633(1993).(5] Meador J. III. Kennell D. Gene 95:1-7(1990) [6] 
Kawata Y.. Sakiyama R. Hayashi R. Kyogoku Y Eur. J. Biochem. 187:265-262(1990).l 7] Kurihara H.. (Mitsui Y, Ohgi 
K.. Irie M., Mizuno K. Nakamura K.T. FEBS Lett. 306:189-192(1992). 

[1437] 578. Ribonucleotide reductase large subunit signature. Ribonucleotide reductase (EC 1.17.4.1) [1 ,2] catalyz- 
es the reductive synthesis of deoxyribonudeotides from their corresponding ribonucleotides. It provkJes the precursors 
necessary for DNA synthesis. Ribonucleotide reductase is an oligomeric enzyme composed of a large subunit (700 to 
1 000 residues) and a small subunit (300 to 400 residues). There are regions of similarities in the sequence of the large 
chain from prokaryotes. eukaryotes and vimses. One of these reg»ns has been devetoped as a signature pattern 
[1438] Consensus pattem: W-x(2)-IIJ=J.x(67)-G-[U\^.IFYRA]-tNH]-x(3)-[STAQLIVMl-tASC]-x(2)-[^^^^ 
[1439] 1 1] Nillson O., Lundqvist T, Hahne S.. Sjoberg B.-M. Bkx:hem. Soc. Trans. 16:91-94(1988) 1 2] Rerchard R 
Science 260:1773-1777(1993). 
[1440] 579. RNase H 

RNase H digests the RNA strand of an RNA/om hybrid. Important enzyme in retroviral replteatk)n cycle, and often 
found as a domain associated wfth reverse transcriptases. Structure is a mixed alpha-^beta foW with three aAx/a layers 
[1441] 58C. Eukaryotk: putative RNA-binding regfon RNP-1 signature (rrm) 

Many eukaryotk; proteins that are known or supposed to bind singte-strandedRNA contain one or more copies of a 
putative RNA-binding domain of about 90amino ackJs [1.2]. This regkxi has been found in the foltowing proteins: - 
Heterogeneous nuclear ribonucleoproteins - hnRNP A1 (helix destabilizing protein) (twk>e). - hnRNP A2/B1 (twrce) 

- hnRNP C (C1/C2) (once). - hnRNP E (UP2) (at least once). - hnRNP G (once). - Small nuclear ribonucleoproteins 
- U1 snRNP 70 Kd (once). - U1 snRNP A (once). - U2 snRNP B" (once). Pre-RNA and mRNA associated proteins 

^ - Protein synthesis initiatkxi factor 4B (elF-4B) (3], a protein essential for the binding of mRNA to rbosomes (once). 

- Nucleolin (4 times). - Yeast single-stranded nucleic acid-binding protein (gene SSB1) (once). - Yeast protein NSR1 
(twfce). NSR1 is Evolved in pre-rRN A processing; ft specificalty binds nuclear kxalization sequences. - Poly(A) binding 
protein (PABP) (4 times). ** Others ** - Drosophila sex detemiination protein Sex-lethal (Sxl) (twice). - Drosophila sex 
detemiinatbn protein Transfomier-2 (T ra-2) (once). - Drosophila 'elav* protein (3 times), whfch is probably involved in 
the RNA metabolism of neurons. - Human paraneoplastic encephalomyelrtis antigen HuD (3 times) [4|, whfch is highfy 
similar to elav and which may pfay a role in neuron-specific RNA processing. - Drosophila •bkxjitf protein (once) [5] a 
segment-polarity homeobox protein that may also bind to specify mRNAs. - [jbl antigen (once), a protein which mky 
play a role in the transcriptkxi of RNA polymerase III. - The 60 Kd Ro protein (once), a putative RNP complex protein. 

- A maize protein induced by abscisic acid in response to water stress. whk:h seems to be a RNA-binding protein. - 
Three tobacco proteins, located in the chloroplast [6], which may be involved in spring andfor processing of chloroplast 
RNAs (twrce). - XI 6 [7]. a mammalian protein whk:h may be involved in RNA processing in relatkxi with cellular pro- 
liferation and^or maturatk)n. - Insulin-induced growth response prcAein CI-4 from rat (twice). - Nucleolysins Tl A-1 and 
TIAR (3 times) [8] whkdi possesses nudeolytic activity agawisl cytotoxk: lymphocyte target cells, may be involved in 
apoptosis, - Yeast RNA1 5 protein, whfch plays a role in mRNA stability and/or poly-(A) tail length [9].lnside the putative 
RNA-binding domain there are two regions which are highly conserved. The first one is a hydrophobe segment of six 
residues (which is called the RNP-2 motrO. the second one is an octapeptide motif (which is called RNP-1 or RNP- 
CS). The posltfon of both motifs in the domain is shown in the foOowing schematk^ representatbn: 

[1 442] xxxxxxx######xxxxxxxxxxxxxxxxxxxxxxxxxxxxx########xxxxxxxxxxxxxxxxxxxxxm X RNP-2 RNP-1 
The RNP-1 motif has been used as a signature pattem for this type of domain. 
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loop. Examples of such proteins are the E 1 -E2 ATPases or the glycolytic kinases. In other ATP- or GTP-bindIng proteins 
the flexible loop exists in a slightly different form; this is the case for tubulins or protein kinases. A special mention must 
be resewed foradenylate kinase, in which there is a single deviation from the P-loop pattern: in the last position Gly is 
found instead of Ser or Thr 
s Consensus pattern: (AGJ-x(4)-G-K-[ST] 

In addition to the proteins listed above, the *A' woXW is also found in a number of other proteins. Most of these proteins 
probably bind a nucleotide, but others are definitively not ATP- or GTP-binding (as for example chymotrypsin, or human 
ferritin light chain). 

( 1) Walker J.E.. Saraste M.. Runswick MJ.. Gay N.J. EMBO J. 1 : 945-951(1 982). [ 2] f^oller W., Amons R. FEBS Lett. 

10 186: 1-7(1 985). [ 3] Fry D.C.. Kuby S.A., Mildvan A.S. Proc. Natl. Acad. Sci. U.S.A. 83:907-9 11(1 986). [ 4) Dever TE.. 
Glynias M.J.. Merrick W.C. Proc. Natl. Acad Sci. U.S.A. 84:1814-1B18(19a7).{ 5] Saraste M., SibbaWPR.. Wittinghofer 
A. Trends Biochem. Sci. 1 5:430-434(1 990).( 6] Koonin E.V. J. Mol. Biol. 229: 1165-11 74(1 993).[ 7] Higgiris C.F.. Hyde 
S.C.. Mimmack M.M.. Gileadi U., Gill D.R., Gallagher M.P J. Bioenerg. Biomembr 22:571-592(1 990). [8] Hodgman T 
C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata).[ 9] Under P. Lasko P. Ashbumer M.. Leroy P. 

IS Nielsen PJ.. Nishi K.. Schnier J., Slonimski PP Nature 337: 121-1 22(1 989).(10] Gorbalenya A.E., Koonin E.V.! 
Donchenko A.P, Blinov V.M. Nucleic Ackis Res. 17:4713-4730(1989). 
[1431] GTP-binding nuclear protein ran signature (ras) 

Ran (or TC4) is a small abundant nuclear protein that binds and hydrolyzes GTP and which has been implicated in a 
targe number of processes including nucleocytoplasmic transport, RNA synthesis, processing and export and cell cycle 
20 checkpoint control (1 .2). Ran is generally included in the RAS *supeftamil/ of small GTP-binding proteins [3], but it is 
only slightly related to the other RAS proteins. It also differs from RAS proteins in that it lacks cysteine residues at its 
C- terminal and is therefore not subject to prenylation. Instead ran has an acidic G-terminus. It is. however similar to 
RAS family members in requiring a specific guanine nucleotide exchange factor (GEF) and a specific GTPase activating 
protein (GAP) as stimulators of overall GTPase activity. The regbn of the GTP-binding B motif which, in ran, is perfectly 
conserved has been selected as a signature pattern. Consensus pattern: D-T-A-G-Q-E-K-[LF]-G-G-L-R-(DE1-G-Y-Y- 
Proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 'A* (P-loop). 
[ 1] Scheffzek K.. Klebe C, Fritz-Wolf K., Kabsch W.. Wittinghofer A, Nature 374:378-381 (1995).[ 2] Rush M.G., Drivas 
G.. tf Eustachio P BioEssays 18: 103-11 2(1 996).( 3] Vialencia A., Chardin P. Wittinghofer A.. Sander G. Bkjchemistry 
30:4637-4648(1991). 
30 [1432] 574. recA signature 

The bacterial recA protein [1,2,3^] is essential for honrtologous recombination and recombinational repair of DNA 
damage. RecA has many activities: it filaments, it binds to single- and double-stranded DNA, itbinds and hydrolyzes 
ATP, it is also a recombinase and, finally, it interacts with lexA causing its aclivatran and leading to its autocatatytic 
cleavage. RecA is a protein of about 350 amino-acid residues. Its sequence is very well conserved [3,4,5^E1J arrwng 
35 euljacterial species. It is also found in the chloroplast of plants [6]. The best conserved region, a nonapeptide tocated 
in the mkidle of the sequence whfch is part of the monomernfiionomer interface in a recA filament has been selected 
as a signature pattem.. 

Consensus pattem: A-L-[KR]-[IF]-[FY]-[STAI-[STAD]-[LIVMQ]-R. 

{ 1] Smith K.C.. Wang T-C. V. BioEssays 10:12-16(1989).! 2] Lloyd A.T. Sharp PM. J. Mol. Evol. 37:399-407(1993). 
<o [ 3] Roca A.I.. Cox M M. Prog. Nucleic Acids Res. Mol. Biol. 56:129-223(1997).! 4] Karlin S.. Weinstock G.M., Brendel 
V. J. Bacteriol. 177:6881-6893(1995).! 5] Eisen J.A. J. Mol. Evol. 41 :1 105-11 23(1 995).[ 6] Cerutti H.D.. Osman M.. 
Grandoni P, Jagendorf A.T. Proc. Natl. Acad. ScL U.S.A. 89:8068-8072(1 992).[E1 1 http://www.tiQr.orQ/~ieisen/RecA/ 

RecA.html 

[1433] 575. Response regulator receiver domain 

This domain receives the signal from the sensor partner inComment: bacterial two-component systems. It Is usually 
found N-termlnalComment: to a DNA binding effector domain. 
[1] Pao GM, Sater MH; J Mol Evol 1995;40:136-154. 
[1434] 576. Ribonucleotide reductase large subunit signature 

•Ribonucleotide reductase (EC 1. 17.4.1) [1.2] catalyzes the reductive synthesis of deoxy ribonucleotides from their 
corresponding ribonucleotides. It provides the precursors necessary for DNA synthesis. Ribonucleotide reductase is 
an oligomers enzyme composed of a large subunit (700 to 1000 residues) and a small subunit (300 to 400 reskJues). 
There are regions of similarities in the sequence of the large chain from prokaryotes. eukaryotes and viruses. One of 
these regk>ns has been selected as a signature pattem. 

[1435] Consensus pattem: W-x(2)-ILF]-x(6.7)-G.ILI VMl-!FYRAHNH]-x(3)-{STAQLI VM]-{ASC)-x(2).IPAl- 
1 1] Nillson O., Lundqvist T. Hahne S., Sjoberg B.-M. Bkx:hem. Soc. Trans. 1 6:91 -94( 1988).! 21 Reichard P Science 
260:1773-1777(1993). 

[1 436] 577. Ribonuclease T2 family histidine active sites 

The fungal ribonucleases T2 from Aspergillus oryzae, M from Aspergillus saltoiand Rh from Rhizopeus niveus are 
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- Campylobacter jejuni cell binding factor 2 (CBF2). a secreted antigen 

- Bacillus subtilis hypothetical protein yacD. 

- Helicobacter pybri hypothetical protein HP01 75. 
A hypothetical slime mold protein. 

- Consensus pattern: F4GSADEIJ-x4LVAQJ-A-x(3HS7]-x(3.4)-|STQ]-x(3.5)-{GERJ^3-x4UVM]-(GSJ 

[ 1) Fischer G.. Schmid F.X 
Blodieniistiy 29:2205-2212(1 990). 

( 2] Rudd K.E., Sofia H.J., Koonin E.V., Plunkett G. Ill, Lazar S., 
Rouviere P. E. Trends Biochem. Sci. 20: 1 4- 1 5( 1 995). 
1 3J RahfekJ J.-U. Ruecknagel K.P., Schelbert B., Ludwig B.. Hacker J 
Mann K., Fischer G. FEBS Lett. 352:180-184(1994). 

11428] 571. (RmaAD) Rixisomal RNA adenine dimethylases signature 

f "n^ow responsible for the dimethylation ol adenosines if ribosonal RNAs (EC 2 1 1 48) have been 

found [1.2] to be evolutionary related. These enzymesare: -i i-i.^u) nave been 

- Bacterial 16S rRNA dimethylase (gene ksgA). whfch acts In the biogenesis of ribosomes by catalyzing the dimeth- 
ylation of two adiacent adenosines in the kx^ of a conserved hairpin near the S'-end of 16S rRN^ 1,^0^2 01 
ksgA leads to resistance to the aminoglycoskte antibiotic kasugamycin inaciivation of 

" ^t*^!?' *^"^' ^"^"^ resistance to nracrolkle^incosamide-streptogramin B (MLS) 

antib.otKs-suchase,ythromycin-bydimethylatingthead8nineresklueatposit«n2058ol23 

in a reduced affinity between ribosomes and the MLS antibiotics. =oo<^J*rHNAinus resulting 

- Caenorhabditis elegans hypothetk^l protein E02H1 . 1 . 

ITlS!! o *" ^"^^ ^ N-tem,inal sectkx, and corresponds to a region that is 

probably involved in S-adenosyl methionine (SAM) binding. ^ 

* iSii;?;^i>s?:chL'iT'^^ 

Il^rp?Ti^noT9'5^'''Art'i ^'"'T^'t^ ^'"^ ^"^"^ D.B.. weissbach K, 

Jones KA., Eds., pp.19-36, Alan R. Uss Inc, New-York, (1990) 

[ 2J Lafontaine D.. Delcour J.. Glasser A.L. Desgres J.. Vfendenhaute J. J. Mot. Btol 241:492-497(1994). 

!S r!.'" bisphosphato cadxjxylase. small chain. 206 members 

(14301 S73. ATP/GTP-bnding site motif A (P-loop) (ras) 

From sequence comparisons and crystallographic data analysis it has been shown (1.2.3,4,5 6) that an aooreciable 
proportion of proteins that bind ATP or GTP share a number of more or less ccnsen/«^ siuwiornSiff 
conserved of these motifs is a glycine-rich regton, whfch typfcally loims a noxl^Z^Z^^^r^Z^ 

l^dt, "'""'^ °' P^P^'* °' ^ nucleotidS^^rCenc^rS^^^^ 

Which the P-loop IS found. A number of protein families for which the relevance of the presence of such a wTte* 

been noted are hsted betow: - ATP synthase alpha and beta subunits. - Myosin heavy cES^nL^il^^XS^ 

Sfa^T^hi ' '^r"'"' ^ «^amin-like proteins - Guanylate kinase -ihymidine SSS^ 

ir^wA^n^ """^'^ " P^o'«" «a^ay (nifHAocC) - ATP-binding p„i^s involv^ij 

port (/^C tiansporters) p) - ONA and RNA helfcases [8,9.10J. - 6TP*inding eton^on factr(ESTu Tf 1 at^ 

EF-G, EF-2, etc.). - Ras family of GTP-binding proteins (Ras, Rho Rab Ral Yot i Xpca ot^ \ ^L! . ■ 

- ADP-ribosylation factors family - Bacterial protel - Bal^;iaUe«^roSi - ISeS, e;^ ^Sl'l^' 

nu.«««e4,ijKihgpro.einsa^hasubun«s(Gi,Gs.Gt.G0.e.c.).-D^ 

type . secretion system protein E. Not all ATP- or GTP-binding proteins are pfc^p t^xTrnT^r^^ 
proteois escape detection because the structure of their ATP-binding site is a^letetyUere^tT^ tL rS^? 



I 
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[ 1] Engemann S.. Herfurth E.. Briesemeister U., Wittmann-Liebold B. 

J. Protein Chem. 14:189-195(1995). 

[1423] 567. Ribosomal protein S9 signature 

Ribosomal protein S9 is one of the proteins from the small ribosonr^l subunit. It belongs to a family of ribosomal proteins 
which, on the basts of sequence similarities (1,21, groups: - Eubacterial S9. - Algal chtoroptast S9. 

- Cyanelle S9. - Archaebacterial S9. - Mammalian S16. - Plant S16. 
Yeast mitochondrial ribosomal S9. 

A conserved region containing many charged residues and located in the central section of these proteins has been 
selected as a signature pattem. 

- Consensus pattern: G-G-G-x(2)-[GSA]-Q'X(2)-[SAl-x(3)-[GSA]-x-IGSTAV]-[KRHGSALl-[LIF] 

[ 1] Chan Y.-L. Paz V. Otvera J.. Wool I.G. FEBS Lett. 263:85-88(1990). 
[ 2] Otaka E.. Hashimoto T. f^izuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1424] 568. Ribulose-phosphate 3-epimerase family signatures 

Ribulose-phosphate 3-epimerase (EC 5.1.3.1) (also known as pentose-5-phosphate 3-epimerase or PPE) is the en- 
zyme that converts D-ribuk3se 5-phosphate into D-xylulose 5-phosphate In Calvin's reductive pentose phosphate cycle. 
In Alcaligenes eutrophus two copies of the gene coding for PPE are known [1 J. one is chromosomally encoded (cbbEC), 
the other one is on a plasmid (cbbeP). PPE has been found in a wide range of bacteria, archebacteria. fungi and plants. 
The sequence of PPE is highly related to: 

Escherichia coli D-allulose-6-phosphate 3-epimerase (gene alsE). 
Escherichia coli protein sgcE. 

- Mycoplasma genitalium hypothetical protein MG 1 1 2. 

All these proteins have from 209 to 241 amino acid residues. 

Two conserved regions which are kxated respectively in the N-terminal and in the central part of these proteins have 
been selected as signature patterns. 

- Consensus pattern: [LI VMFJ-H-[LIVMFY]-D-[LIVMJ-x-D-x(1 ,2)-(FYJ-[UVMJ-x-N-x-[STAVJ 

- Consensus pattem: [LIVMA]-x-[LIVMJ-M-{ST]-[VS]-x-P-x(3)-G-Q-x-F-x(6)-[NKl-[LIVMCl 

[ 1] Kusian B.. Yoo J.G., Bednarski R., Bowien B. 
J. Bacteriol. 174:7337-7344(1992). 

[1425] 569. (Rrcin B lectin) Similarity to lectin domain of rrcin beta-chain, 3 copies. 
[1426] This family consists of a triplk^ated domain involved In cell agglutlnatbn in ricin. 
[1427] 570. (Rotamase) PpiC-type peptidyl-prolyl cis-trans isomerase signature 

Peptidyl-prolyl cis-trans isomerase (EC 5.2.1.8) (PPIase or rotamase) is an enzyme that accelerates protein folding 
by catalyzing the cis-trans isomerization of proline imtdic peptide bonds in oligopeptides [1 ]. Most characterized PPiases 
belong to two families, the cyclophllin-type (see <PDOC001 54>) and the the FKBP-type (see <PDOC00426>). Recently 
a third family has been discovered [2,31. So far. the only biochemk^lly characterized member off this family is the 
Escherichia coli protein panojlin (gene ppiC). a small (92 residues) cytoplasmic enzyme that prefers amino acid resi- 
dues with hydrophobic skJe chains like leucine and phenylalanine in the PI position of the peptides substrates, PpiC 
is evolutkmary related to a number of proteins that are also probably PPiases: 

- Escherichia coli and Haemophilus influenzae ppiD. PpiD is a PPIase whfch contains a per^Jlasmic ppiC-like domain 
anchored to the inner membrane and which seems to be involved in the foWing of outer membrane proteins, 

- Escherichia coli surA. SurA is a periplasms protein that contains two ppiC-like domains. 

- Nitrogen-assimilating bacteria protein nifM which is involved in the activation and stabilization of the iron-compo- 
nent (nifH) of nitrogenase. 

- Bacillus subtilis protein prsA, a membrane-bound lipoprotein involved in protein export. 

- Uctococcus and lactobacillus protease maturation protein prtM. a membrane-bound lipoprotein involved in the 
maturatkxt of a secreted serine proteinase, - Yeast protein ESS1/PTF1 (processingAerminatbn factor 1 ). 

- Drosophila protein dodo (gene dod). - Mammalian protein PINl, 



EP 1 033 405 A2 



directly to part ot the 3'end of 16S ribosomal RNA. It belongs to a family of ribosomal proteins which on the basis of 
sequence similarities [1.2.3], groups: - Eubacterial S7. k :> wn^n. on me oasis or 

- Algal and plant chtoroplast S7. - Cyanelle S7. - Archaebacterial S7. 

- Plant mitochondrial S7. - MammaUan S5. - Plant S5. 
Caenorhabditis elegans S5 (T05E 11,1). 

The best conserved region located in the N^erminal section oJ these proteins has been selected as a signature pattern. 
' i?TACr"^ '°^'^®'^"''^'-'^'^°^^''<3HUVMFTA)(2^x(6)-G-K4KR]-x(5HUVMFJ4UVMFCJ-x(2)- 

aolSgga? ^'^ ^'^^^ " • WMmann-Uebold B. Biol. Chem. Hoppe-Seyler 374: 

{ 2J Otaka E.. Hashimoto T, Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

[ 3] Ignatovich O., Cooper M.. Kulesza H.M.. Beggs J.D. Nucleic Acids Res. 23:4616-4619(1995). 

[1 41 7] 564. Ribosomal protein S7e signature 

t1418] A number ol eukaryotic ribosoinal proteins can be grouped on the basis of sequence similarities [11 One of 
these families consists of: i ■ j- ^* 

Mammalian S7. 
Xenopus S8. 
Insect S7. 

Yeast probable ribosomal protein S7 (N2212). 

- Fission yeast probable ribosomal protein S7 (SpAC18G6.13c). 

I^t^^'I^H l!!^-^"^^ ^ ^"'^^ ^ ^'^^'y ^'^'^^^ ^''^'"^ ' ^ ^^^'^"^ ^ located in the central 
section and which is nch in charged residues was selected as a signature pattern 

[1419] Consensus pattern: [KR]-L-x-R-E-L-E-K-K-F-{SAPJ*x-[KRhH 

llll^l cJc^t^ " ^"'^-"^"^"^ Kumar V., Collins RH. Nucleic Acids Res. 21:4147-4147(1993) 
[1 421] 565. Ribosomal protein S8 signature 

SiS^i f ^^^"^ ^" Escherichia coli. S8 is known to bind 

d. ectly to 16S nbosomal RNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities 
[1], groups: - Eubacterial S8. - Algal and plant chtoroplast S8. simiiamies 

- Cyanelfe S8. - Archaebacterial S8. - Marchantia polymorpha mitochondrial SB 

- Mammalian S15A. - Plant S15A. - Yeast S22 (S24). 

The best consented region located in the C-temiinal section of these proteins has been selected as a signature pattern. 

- Consensus pattern: IGE]-x(2HUV](2HSTYJ-tSTl-x{2)^4U VM]{2)-x(4HAGJ^^ 

( 1] Otaka E.. Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 
[1422] 566. Ribosomal protein S8e signature 

l^:S^'Jl^.T^:^''''''' ''^"^"^ can be grouped on .he basis ol sequence similarfties 

■ Mammalian S8. - Caenorhabditis elegans S8 (F42C5.8). - Leishmania major S8 

- Plant S8. - Yeast S8 (S14) (Rpl9). - Archebacterial S8e. 

These proteins have either about 220 amino acids (in eukaryotes) or about 125 amino acids (in archebacteria) A 

T"^ "^'"^ " ^ i« rich in positively chargi rSLs^Sei 

selected as a signature pattern. oi«»ubdii 

- Consensus pattern: (KRhx(2)HST]-G-IGAJ-x(5)4HRHKGHKRJ-x-K-x-E-{LMJ-G 
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- Consensus pattern: H-x-K-R-[LIVMFJ-[SANKl-x-P-x(2)-{WY]-x-[UVM]-x-lKRPJ 

{ 1] Fisher E.M., Beer-Romero P.. Brown LG., Ridley A.. McNeil J.A.. Lawrence J.B.. Willard H.F., Bieber RR.. 
Page D.C. Cell 63:1205-1218(1990). 

( 2J Braun H.R, Emmermann M., Mentzel H., Schmitz U,K. Biochim. Biophys. Acta 1218:435-438(1994). 
[1413] 560. Ribosomal protein S5 signature 

Ribosomal protein S5 is one of the proteins from the small ribpsomal subunit. In Escherichia coli, S5 is known to be 
important in the assembly and function of the 30S ribosomal subunit. Mutations in S5 have been shown to increase 
translational en^or frequencies. It belongs to a family of ribosonrral proteins which, on the basis of sequence similarities 
[1.2]. groups: - Eubacterial S5. 

- Cyanelle S5. - Red algal chloroplast S5. - Archaebacterial S5. 

- Mammalian S2 (LLrepS). - Caenortiabdrtis elegans S2 (C49H3.11 ). 

- Drosophila S2. - Plant S2. - Yeast S4 (SUP44). - Fungi mitochondrial S5. 

55 is a protein of 1 66 to 254 amino-acid residues. The signature pattem for this protein is based on a consented region, 
rich in glycine residues, and located in the N-terminal section of these proteins. 

- Consensus pattem: G-[KRQ].x{3)-[FYl-x-[ACV]-x(2)-[LIVMA]-(LIVM)-(AG]-[DNl-x(2)-G-x4LIVMJ-G-x-[SAGl-x 
(5.6)-[DEQ)-(UVMA]-x(2)-A-[UVMF] 

( 1] All-Robyn J.A.. Brown N.. Otaka E.. Liebnnan S.W. 

Mol. Cell. Biol. 10:6544-6553(1 990). ( 2] Qaka E., Hashimoto T.. Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 
[1414] 561. Ribosomal protein S6 signature 

Ribosomal protein S6 is one of the proteins from the small ribosomal subunit. In Escherichia coli. SB is known to bind 
together with SI 8 to IBS ribosonial RNA. It betongs to a family of ribosomal proteins which, on the basis of sequence 
similarities, groups: * Eubacterial S6. - Red algal chbroplast S6. 

- Cyanelle S6. 

56 is a protein of 95 to 208 amino-ackl residues. The signature pattem for this protein is based on a conserved region 
located in the N-termtnal section of these proteins. 

- Consensus pattem: G-x-{KRC]-IDENQRH]-LHSA]-Y-x-KKRNSA] 
[1415] 562. Ribosomal protein S6e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Mammalian S6 [1]. - Drosophila S6 [2]. - Plant S6 (3). - Yeast S10 (YS4). 

- Halobacterium marismortui HS13 [4]. - Methanococcus jannaschii MJ1260. S6 is the major substrate of protein 
kinases in eukaryotic ribosomes [5]; it may have an important role in controlling cell growth and proliferation through 
the selective transialkxi of partk:ular classes of mRNA. 

These protekis have 135 to 249 amino acids. 

A consented stretch of 12 residues in the N-terminal part of these proteins has been selected as a signature pattem. 
• Consensus pattem: [LIVMJ-(STAMR]-G-G-x-D-x(2)-G-x-P-M 

[ 1] Franco a. Rosenfeld M.G. J. Biol. Chem. 265:4321-4325(1990). 

[ 2) Watson K.L. Konrad K.D.. Woods D.F., Bryant P.J. Proc. Natl. Acad. Sci. U.S.A. 89:11302-11306(1992). 
{ 3] Hansen 6., Estruch J.J.. Spena A. Nucleic Acids Res. 20:5230-5230(1992). 

1 4] Kimura M., Amdt E.. Hatakeyama T. Hatakeyama T, Kimura J. Can. J. Microbiol. 35:195-199(1989). 
[ 5J Bandi H.R., Ferrari S.. Krieg J.. Meyer H.E.. Thoirias G. J. Bk>l. Chem. 268:4530-4533(1993). 

[1416] 563. Ribosomal protein S7 signature 

Ribosomal protein S7 is one of the proteins from the small ribosomal subunit. In Escherrchia coli. S7 is known to bind 
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These proteins have front 220 to 250 amino acids. 

A conserved stretch in their N-terminal section was selected as a signature pattern. 
• Consensus pattern: (LIV]-x-fGHJ-R.(lV]-x-E-x-[SC]-L-x-D-L 

1 1J Liu J.H.. Reid D.M. 

Plant Physiol. 109:338-338(1995). 

[1410] 557. Ribosomal protein S3 signature 

Ribosomal protein S3 is one of the proteins from the small ribosomal subunrt In Escherichia coli « knnu^ 

- Algal and plant chloroplast Sa - Cyanelle S3. - Archaebacterial S3 

- Plant mitochondrial S3 - Vertebrate S3. - Insect S3. 

- Caenorhabditis elegans S3 {C23G 1 0.3). - Yeast S3 (Rpl 3), 

S3 is a protein of 209 to 559 aminonacid residues. 

A consen/ed region located in the C-terminal section has been selected as a signature pattern. 

' x'SS-xcSr^^^ 

[ llOtaka E., Hashimoto I, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 
(1411J 558. Ritjosomal protein S4 signature 

Ribosomal protein S4 is one ol the proteins from the small ribosomal subunit In Escherichia coli ai « ir«««n t„ Ki~i 
d.rectly to 1 6S ribosomal RN^ MutaUons in S4 have been shown to increareUnslt^'^^^^q^^^^ 

In? SJ^h°:CasTSi.''"^"^ °" ^--^ -^i- «-p^^ - Eu=r -rrs 

- Cyanefle S4. - Archaebacterial S4. - Mammalian S9. - Yeast YS11 (SUP45) 

- Marchantia polymorpha mitochondrial S4. - Dictyostelium discoideum rpl 024 

' H^'iS hlT T""^ ^^1 characterizedas asuppressorforochre mutations In mitochondrial DNA. 

It could be a nbosomal pratein that acts as a suppressor by decreasing translation accuracy. 

*° residues (except for NAM9 which is much larger). The signature pattern for 

th« prote*, B based on a conserved region located In the central section of these proteins 

' x'JlvSP"'^ l^'^HDEhx.R4U]-x(3)4LIVMCH>^FYHQHKRT)-x(3)-ISTAGCVn-^^^^^^ 

(1j Mizuta K.. Hashimoto T. Suzuki K.I..OtakaE. Nucleic Acids Res. 19:2603-2608(1991) 
1 2J Otaka E.. Hashimoto T. Mizuta K. Protein Seq. Data Anal. 5:285-300(1 993) 

[ 3J Boguta M Dmochowska A.. Borsuk P.. Wrobel K. Gargouri A.. Lazowska J.. Slonimski P Szczesniak B 
Kruszewska A Mol. Cell. Bk)l. 12:402-412(1992). «onimsw k, bZczesniaK B.. 

(14121 559. Ribosomal protein S4e signature 

■ th:T;t Srliier ^'"'^^ '^^^ °' .onec^^.^^ eh^mosome Y. and 

- Plant cytoplasmic S4 J2J - Yeast S7 (YS6). - Archebacterial S4e. 

These proteins have 233 to 264 amino acids. 

A highly consented stretch of 15 residues in their N^e^ninal sectkjn has been selected as a sionature natf«,« c«.., 
positions in this region are positively charged resklues. signature pattern. Four 
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A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Vertebrate S24 [1 ). - Yeast Rp50. - Mucor racemosus S24 [2J. 

5 - Halobacterium marismortui HS1 5 [3]. - Methanococcus jannaschii MJ0394. 
These proteins have 101 to 148 amino acids. 

A well conserved stretch in the central part of these proteins has been selected as a signature pattern. 

10 . Consensus pattern: [FYA)-G-x(2)-[KRl-[STA]-x-G-[FY]-[GA]-x-ILIVM]-Y-[DN)-[SDN] 

( 1) Brown S.J.. Jewell A., Maki C.G., Floufa DJ. Gene 91:293-296(1990). 
( 21 Sosa L., Fonzi W.A.. Sypherd P S. 

IS [1406] Nucleic Acids Res. 17:931 9-9331 (1989).[ 3] Kimura J.. Arndt E.. Kimura M. FEBS Lett 224:65-70(1987). 
[1407] 554. Ribosomal protein S26e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. One of these families 
consists of: - Mammalian S26 [1 ]. 

20 . Octopus S26 [2). - Drosophila S26 (DS31 ) [3]. - Plant cytoplasmic S26. 

- Fungi S26 [4]. 

These proteins have 114 to 127 amino acids. 

A conserved octapeptide in the central part of these proteins has been selected as a signature pattern. 



25 



45 



SO 



ss 



Consensus pattem: [YH]-C-V-S-C-A-I-H 



( 1] Kuwano Y.. Nakanishi O., Nabeshima Y, Tanaka T, Ogata K. J. Biochem. 97:983-992(1 985). [ 2] Zinov'eva R. 
D.. Tomarev S.l. DokL Akad. Nauk SSSR 304:464-469(1989). 
30 [ 3] ttoh N.. Ohta K.. Ohta M.. Kawasaki T, Yamashina I. Nucleic Acids Res. 17:2121-2121(1989).[ 4] Wu M . Tan 

H. Gene 1 50:401 -402(1 994). 

[1408] 555. Ribosomal protein S28e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
3S One of these families consists of: 

- Mammalian S28 [1]. - Plant S28 [21, - Fungi S33 [3]. 
Methanococcus jannaschii MJ 1 202. 

40 These proteins have from 64 to 78 amino acids. 

A highly consented nonapeptide from the C-terminal extremity of these proteins has been selected as a signature 
pattem. 



Consensus pattern: E-[ST]-E-R-E-A-R-x-L 

[ 1] Chan Y.-L. Olvera J., Wool I.G. 

Biochem. Bkjphys. Res, Commun, 179:314-318(1991). 

[ 21 Hwang I, Goodman H.M. Plant Phystol. 102:1357-1358(1993). 

[ 3] Hoekstra R.. Ferreira RM., Bootsman TC, Mager W.H.. Planla R.J. Yeast 8:949-959(1992). 
[1409] 556. Ribosomal protein S3Ae signature 

A number of eukaryotk; and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Mammalian S3A (was originally known as v-fos transformation effector protein). - Caenortiabditis elegans S3A 
(F56F3.5). 

- Plant cytoplasmic S3A (CYC07) (1 ]. - Yeast RpIO (PLC1 and PLC2). 

- Fisskxi yeast RpIO (SpACI 3G6.02c). - Methanococcus jannaschii MJ0980. 
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(1.2J. One of these famiiies consists of: - Mammalian SI 9. - Drosophila Si 9. 

- Ascaris lumbricoides Si 9g (ALEP-1 ) and 81 9s. - Yeast YS1 6 (RP55A and RP55B). 
Aspergillus 816.- Halobacterium marismortui HS 1 2. 

These proteins have 143 to 155 amino acids. 

A wen conseived stretch of 20 residues in the C-terminal part of these proteins has been selected as a signature pattern. 

- Consensus paHem; P-x(6)-[SAN]-x(2HLIVMA]-x-R-x4AUV]-ILVJ-Q-x-L-IEQJ 

( 1J Etter A.. Aboutanos M., Tobler K, Mueller R 

Proc. Natl. Acad ScL U.S.A. 88:1593-1596(1991). 

( 2J Suzuki K., Olvera J.. Wool I.G. Biochimie 72:299-302(1990). 

[1400] 550. Ribosomal protein 82 signatures 

Ribosomal protein 82 is one of the proteins from the small ribosomal subunit. S2 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities [1.21. groups: - Eubacterial S2. - Algal and plant chtoroplast S2. 

- Cyanelle 82. - Archaebacterial S2. 

- Higher eukacyotes P40 (previously thought to be a laminin receptor). 

- Yeast NASI . - Plant mitochondrial 82. - Yeast mitochondrial MRP4. 

82 is a protein of 235 to 394 amtno-acid residues. 

Two consented regions have been selected as signature patterns. One is located in the N^erminal section and the 
other in the cential section. 

- Consensus pattern: lLIVMFA]-x(2)-[UVMFYCJ(2)-x-[STACHG8TANQEKRHSTALV^ 
[HYHUVMFl-G ^ 

- C<^sensus pattern: P-x(2)^LI\^F](2HUVMShx4GDN]-x(3)-[DENLhx(3)-lU\^hx-E-x(4)-[GNQKRH]4 

[APJ 

1 1] Davis S C., Tzagoloff A.. Ellis 8.R 
J. Biol. Chem. 267:5508-5514(1992). 

[ 2] Tohgo A., Takasawa S.. Munakata H.. Yonekura H,. Hayashi N., Okamoto K FEBS Lett. 340:133-138(1994). 
[1401] 551. Ribosomal protein S21 signature 

[14Cttl Ribosomal protein S21 is one of the proteins from the small ribosomal subunit. So far 821 has only been 

found in eubacteria. It is a protein of 55 to 70 aminc^ackJ residues. A consented region in the N-terminal sectwn of the 

protein has been selected as a signature pattern. 

[1 403] Consensus pattem: [DE]-x-A-[U YHKR]-R-F.K-[KR]-x(3)-IKRJ 

[1 404] 552. Ribosomal protein 821 e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. One of these families 
consists of: - Mammalian S21 [1 J. 

- Caenorhabditis elegans 821 (F37C12.11). - Rice 821 [2]. 

- Yeast 821 (Ys25) (3]. - Fission yeast 828 [4]. 

These proteins have 82 to 87 amino acids. 

A perfectly conserved nonapeptide in the N-temiinal part of these proteins has been selected as a signature pattem. 

- Consensus pattem: L-Y-V-P-R-K-C-8-[8Al 

( 1 J Bhat K.8., Morrison S.G. Nucleic Ackte Res. 21:2939-2939(1993). 
( 2] Nishi a. Hashimoto H.. Uchmiiya H.. Kato A. 

Biochim. Biophys. Acta 1216:11 3-11 4(1 993).[ 3] Suzuki K.. Olaka E. Nucleic Acids Res 16 6223-6223(1988) 1 4] 
Itoh T.. Okata E.. Matsui K. A. Biochemistry 24:741 8-7423(1 985). 

[1405] 553. Ribosomal protein 824e signature 
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- Yeast S18a and S18b (RP41; YS12). 

The best conserved regions located in the C-terminal sections of these proteins have been selected as a signature 
pattern. 

5 

- Consensus pattern: G-D-x-[LIVJ-x-(LIVA]-x-[QEKl-x-[RK]-P-[LIV]-S 

I 1J Gantt J.S.. Thompson M.D. J. Biol. Chem. 265:2763-2767(1990). 
[ 2] Herfurth E.. Hirano H.. Witlmann-Llebofd B. 
10 Biol. Chem. Hoppe-Seyler 372:955-961(1991). 

1 3) Otaka E.. Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 



75 



20 



[1396] 546. Ribosomal protein S17e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Vertebrates SI 7 [1]. - Drosophila Si 7 [2]. • Neurospora crassa SI 7 (crp-3), 

- Yeast S17a {RP51 A) and 31 7b (RP51 B) [3]. - Methanococcus jannaschii MJ0245. These proteins have from 63 
(in archebacteria) to 1 30 to 1 46 amino acids and are highly consented. A region in the central part of these proteins 
has been selected as a signature. 

■ Consensus pattern: A-x-l-x-[ST]-K-x-L-R-N-lKR]-l-A-G-[FY]-x-T-H 

[ 1J Chen l.-T. Roufa D.J. Gene 70:107-116(1988). 
25 [ 2] Makt C. Rhoads D.D., Stewart M.J., van Slyke B., Denell R.E.. 

Roufa D.J. Gene 79:289-298(1 989).[ 3J Abovich N., Rosbash M. 
Mol. Cell. Biol. 4:1871-1879(1984). 



30 
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[1397] 547. Ribosomal protein S18 signature 

Ribosomal protein S18 is one of the proteins from the small ribosomal subunit. In Escherichia coli. S 18 has been 
involved in aminoacyMRNA binding(1]. It appears to be situated at the tRNA A-site of the ribosome. It belongs to a 
family of ribosomal proteins which, on the basis of sequence similarities[2], groups: - Eubactertal S18. - Algal and plant 
chloroplast S18. - Cyanelle S18.As a signature pattem, a conserved region in the central section of the protein has 
been selected. This region contains two basic residues which may be involved in RNA-binding - 
[LIVmTs]"^ Pattem: [IVHDYl-Y-x(2)-[LIVMT]-x(2)4LIVM].x(2)-[FYTHLIVMl. [STHDERPl.x-[GY]-K-IUVM]-x(3)-R- 

[ 1) McDougall J.. Choli T, Kruft V., Kapp U.. Wittmann-Liebold B. FE8S LeU. 245:253-260(1 989).[ 2) Otaka E., Hash- 
imoto T. Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 
[1398] 548. Ribosomal protein SI 9 signature 

Ribosomal protein SI 9 is one of the proteins from the small ribosomal subunit. In Escherichia coli. SI 9 is known to 
form a complex with SI 3 that binds strongly to IBS ribosomal RNA. SI 9 belongs to a family of ribosomal proteins 
which, on the basis of sequence similarities [1 ,2J, groups: - Eubacterial S19. 

Algal and plant chloroplast S19. - Cyanelle S19. - Archaebacterial S19. 
^ - Plant mitochondrial SI 9. - Eukaryotic SI 5 ('rig* protein). 

SI 9 is a protein of 88 to 144 amino-add reskJues. Our signature pattem is based on the few consen/ed positions 
located in the C-terminal sectbn of these proteins. 



40 



so 



Consensus pattem: [STDNQJ-G-IKRQMJ-x(6)-lUVM]-x(4)-[LIVMl-{GSD]-x(2)-(U^HGASJ-(DE)-F-x(2HSTJ 



[ 1] Kitagawa M., Takasawa S., Kikuchi N., Itoh T, Teraoka H.. Yamamoto H.. Okamoto H FEES Lett 283 210-214 
(1991). 

[ 2] Otaka E., Hashimoto T, Mizuta K. 
ss Protein Seq. Data Anal. 5:285-300(1993). 



[1399] 549. Ribosomal protein S19e sigriature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence, similarities 
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groups: 



Eubactertal S14. 

Algal and plant chloroplast SI 4. 

- Cyane!leSl4. 

Archaebacteriat Methanococcus vanni elii S 1 4 
Plant mitochondrial S14. 
Yeast mitochondrial MRP2. 
Mammalian S29. 

- Yeast YS29A/B. 



[1387] S14 is a protein of 53 to 115 amho-acid residues. Our signature pattern is based on the few conserved 
positions located in the center of these proteins. 

[1388] Consensus pattern: lRPJ-x{0.1)-C-x(11.12)4LIVMF]-x4LIVMFJ-[SCJ-[RGJ-x(3)-[RN] 

[1] Chan Y-L. Suzuki K.. Olvera J.. Wool I.G. Nucleic Acids Res. 21:649-655(1993). 
[2} Otaka E.. Hashimoto T. Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

[1389] 543. Ribosomal protein SI 5 signature 

Ribosomal protein SIS is one of the proteins from the small rftxjsomal subunit. In Escherichia coli. this protein binds 
to 16S ribosomal RNA and functions at early steps in ribosome assembly. It betongs to a family of ribosomal proteins 
which, on the basis of sequence similarities [1,2}, groups: - Eubacterial SI 5. 

- Archaebacterial Habbacterium marismortui HmaSi 5 (HS11 ). 

- Plant chtoroplasl SI 5. - Yeast mitochondrial S28. - ^4ammalian SI 3. 

- Brugia pahangi and Wuchereria bancrofti S13 (SIS). - Yeast S13 (YS15). 

SI 5 is a protein of 80 to 250 amino-acid residues. 

A consented region kscated in the C-temiinal part of these proteins has been selected as a signature pattern. 

- Consensus pattern: [UVMJ-x(2)-H4LIVMFYI-x(5)-D.x(2)-(SAGN].x(3)-{LFl.x(9)-[UVM]-x(2)-{FYl 

[IJDangK. Ellis S,R. 
Nucleic Acids Res. 18:6895-6901(1990). 
[ 2J Otaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1390] 544. Ribosomal protein SI 6 signature 

[1391] Ribosomal protein SI 6 is one of the proteins from the small ribosomal subunit It betongs to a family of rtoos- 
omal proteins which, on the basis of sequence similarities [1], groups: 

- Eubacterial 816. 

- Algal and plant chbroplast 816. 
CyanelleSie. 

Neurospora crassa mitochondrial S24 (cyt-21 ). 

[1 392] 81 6 is a protein of about 1 00 amino^ckJ resdues. A consented regfon located in the N-terminal extremity of 

these proteins has been selected as a signature pattern. 

[1 393) Consensus pattem: |U VMTJ-x-[U VMHKRhL-{STAK]-R-x-G-{ AKR] 

[1394] [1J Otaka E.. Hashimoto T. Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

[1395] 545. Ribosomal protein SI 7 signature 

Ribosomal protein 817 is one of the proteins from the small ribosomal subunit. In Escherichia coli 817 is known to 
bind specrically to the 5'end of 168 ribosomal RNA and is thought to be frwolved in the recognition of termination 
codons. It betongs to a family off ribosomal proteins whk:h. on the basis of sequence similarities J1 2 3] groups - 
Eubactertal 817. 



Plant chtoroplast SI 7 (nuclear encoded). - Red algal chtoropiast 817. 
Cyanelle 817. - Archaebacterial 817. - Manrmnalian and plant cytoplasmic 811. 
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' Consensus pattern: [UVMF]-x-[GSTAC14UVMF]-x(2)4GSTALJ-x(0 J HGSNHLIVMFl-x-ILIVMl-x 
x-[PAMSTCHJ-[DN] 

[ 1] Kimura M., Kimura J., Hatakeyama T. FEBS Lett. 240:15-20(1988). 
( 2] Otaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1382] 539. Ribosomal protein SI 2 signature 

Ribosomal protein SI 2 is one of the proteins from the small ribosomal subunit. In Escherichia coli, SI 2 is known to be 
involved in the translation initiation step. It is a very basic protein of 120 to 150 amino-acid residues. SI 2 belongs to 
a family of ribosomal proteins which, on the basis of sequence similarities [1], groups: - Eubacterial S12 - Archaebac- 
teriaf SI 2. 

- Algal and plant chloroplast S12. - Cyanelle S12. 

IS - Protozoa and plant mitochondrial SI 2. - Yeast S28. 

- Drosophila mitochondrial protein tko (Technical KnockOut). - ly^ammalian S23. The best conserved regions in these 
proteins, located in the center of each sequence have been selected as a signature pattern. 

- Consensus pattern: [RK)-x-P-N-S-[AR]-x-R 

20 I ij Otaka E., Hashimoto T, l^izuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 

[1 383] 540. Ribosomal protein SI 2e signature 

A number of eukaryotk: ribosomal proteins can be grouped on the basis of sequence similarities. One of these families 
consists of: - Vertebrate SI 2 {1]. 

25 

- Trypanosoma brucei S12 [2]. - Caenorhabditis elegans S12 (F54E7.2). 
Drosophila S 1 2. - Yeast S 1 2. 

These proteins have 130 to 150 amino acids. 

A conserved region in the N^ermtnal part of these proteins has been selected as a signature pattern. 

- Consensus pattern: A-L-(KRQP]-x-V-L-x(2)-{SA]-x{3)-[DN]-G-L 
[ 1] Lin A., Chan Y-L, Jones R.. Wool I.G. 

J. Biol. Chem. 262:1 4343-14351(1 987). ( 2] Marchal C. Ismaili N., Pays E. Mol. Bkx:hem. Parasitol. 57:331-334(1 993). 
[1 384] 541 . Ribosomal protein SI 3 signature 
Ribosomal protein S13 is one of the proteins from the small ribosomal subunit. In Escherichia coli, SI 3 is known to be 
involved in binding fIVIet'tRNA and, hence, in the initiation of translation. It is a basic protein of 115 to 177 amino-acid 
residues and belongs to a family of ribosomal proteins which, on the basis of sequence similarities f 1 21 groups* - 
40 Eubacterial SI 3. 

- Plant chloroplast SI 3 (nuclear encoded). - Red algal chloroplast SI 3. 

- Cyanelle SI 3. • Archaebacterlal Si 3. - Plant mitochondrial SI 3. 
I^ammalian and plant S18. 
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The best consented regions in these proteins, located in their C-terminal part have been selected as a signature pattern. 

- Consensus paUem: [KRQS]-G-x-R-H-x(2)-[GSNHJ-x(2)-[UVMC]-R-G-Q 

so [ IJChanY.-L. Paz v.. Wooll.G. 

Biochem. Biophys. Res. Commun. 178:1212-1218(1991). 
[ 2] Otaka E.. Hashimoto T. I^izuta K. 
Protein Seq. Data Anal. 5:265-300(1993). 

ss [1385] 542. Ribosomal protein S14p/S29e (Ribosomal protein SI 4 signature) 

[1386] Ribosomal protein SI 4 is one of the proteins from the small ribosomal subunit. In Escherichia coli. SI 4 is 
known to be required for the assembly of 30S particles and may also be responsible for determining the conformation 
of 1 6S rRN A at the A site. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [ 1 .2], 
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- Vertebrate L7A (SURF3) p J. - Plant L7A. - Yeast L7A (YL5) (Rp6). 

- Yeast protein NHP2 12]. - Yeast hypothetical protein YEL026w. 

• Bacillus subtilis hypothetical protein ylxQ. - Halobacterium marismortui Hs6. 
Methanococcus jannaschii MJ 1 203. 

[1 377] These proteins have 1 00 to 265 amino-acid residues. 

A conserved region located in the central sectbn has been selected as a signature pattern. 

- Consensus pattern: lCAJ-x(4)-(iV]-P-{FYJ-x(2HUVM]-x-lGSQJ-{KRQhx(2)-L-G 

[ 1] Colombo P., Yon J., Garson K., Fried M. Proc. NatL Acad. Sci. U.S.A. 89:6358-6362(1992). 
[ 2] Kolodrubetz D.. Burgum A. Yeast 7:79-90(1 991 ). 

[1378] 636. Ribosomal protein L9 signature 

Ribosonrval protein L9 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L9 is known to bind 
directly to the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities f 1 21 
groups: - Eubacterial L9. - Cyanobacterial L9. 

- Plant chloroplast L9 (nuclear-encoded). - Red algal chloroplast L9. 

A consen/ed region, located in the N-temiinal section of these proteins has been selected as a signature pattern. 

- Consensus pattern: G-x(2)-[GN]-x(4)-V-x(2)-G-[FY]-x(2)-N-[FY]-L-x(5)-[GA]-x(3)-[STN] 

( 1JHoffmanD.W..DaviesC..GerchmanS.E.,KyciaJ.H..PorterS.J.. White S.W.,Ramakris^ 13- 
205-212(1994). 

1 2J Otaka E., Hashimoto T. Mizuta K., Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 
[1 379] 537. Ribosomal protein SI 0 signature 

Ribosomal protein S10 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S10 is known to be 
involved in banding tRNA to the ribosomes. It bekxigs to a lamily of ribosomal proteins which, on the basis of sequence 
similarities [1], groups: - Eubacterial 810. 



- Algal chloroplast S10. - Cyanelle S10. - Archaebacterial S10. 

- Marchantia polymorpha and Prototheca wickerhamii mitochondrial S10. 

- Arabidopsis thaliana mitochondrial S10 (nuclear encoded). - Vertebrate 820 

- Plant S20. - Yeast URP2. 



810 is a protein of about 100 amino-ackJ residues. 

[1 380] A consented region tacated in the center of these proteins has been selected as a signature pattern. 
- Consensus pattern: [AVl-x(3)-[GDNSRHUVMSTAhx(3)-G-P.{LIVM]-x-(UVM]-P-T 

1 1 J Otaka E.. Hashimoto T. Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 
[1381] 538. Ribosomal protein S11 signature 

Ribosomal protein 811 111 plays an essential role in selecting the correct tRNA in protein biosynthesis It is kjcated on 
the large bbe of the small ribosomal subunit. S11 betongs to a family of ribosomal proteins which, on the basis of 
sequence similarfties. groups [2]: - Eubacterial 811. 



Algal ana plant chtoroplast 811. - Cyanelle 811. - Archaebacterial 811. 
Marchantia potynrorpha and Prototheca wickerhamii mitochondrial 811. 

Acanthamoeba castellanii mitochondrial 811 . - Neurospora crassa 814 (crp-2). - Yeast S14 (RP59 or CRY1 ) 
Manrvnalian. Orosophila. Trypanosoma, and plant 814. 
Caenorhabditis elegans 814 (F37C12.9). 



One of the best consented regfons in these proteins was selected as a signature pattern. 
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L5 is a protein of about 180 amino-acid residues, 

A conserved region, located in the first third of these proteins has been selected as a signature pattern. 

- Consensus pattern: (LIVM)-x(2HLIVMl-(STAVCl-[GEJ-IQV]-x(2HLIVMAl-x-[STCl.x4 

( 11 Hatakeyama T. Hatakeyama T. Biochim. Biophys. Acta 1039:343-347(1990). 
[ 2] Rosendahl G., Andreasen P.H.. Kristiansen K. Gene 98:161-167(1991). 

[ 3] Yang D., Gunther I.. Matheson A T, Auer J., Spicker G., Boeck A. Blochimie 73:679-682(1991). 
1 4] Otaka E., Hashimoto T, Mizuta K.. Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 

[1 370] 532. ribosomal L5P family C-temninus 

[1371] This region is found associated with RibosomaLLS. Number of members: 60 
[1372] 533. Ribosomal protein L6 signatures 

[1 373] Ribosomal protein L6 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L6 is known 
to bind directly to the 23S rRNA and is kx^ated at the aminoacyMRNA binding site of the peptkJyItransferase center. It 
belongs to a family of ribosomal proteins which, on the basis of sequence similarities (1 .2,3.4], groups: - Eubacterial L6. 

Algal chloroplast LB. 
Cyanelle L6. 
Archaebacteria! L6. 

Marchantia polynrwrpha mitochondrial L6. 
Yeast mitochondrial YmL6 (gene K4RPL6). 
Mammalian L9. 
Drosophila L9. 
Plants L9. 
Yeast L9(YL11). 

1374] While all the above proteins are evolutionary related it is veiy difficult to derive a pattern that wilt find them 
all. Two patterns were therefore created, the first to detect eubacterial, cyanelle and mitochondrial L6. the second to 
detect archaebacterial L6 as well as eukaryotic L9. 

- Consensus pattern: [PS]-[DENS]-x-Y-K-[GA]-K-G-{LIVM] 

- Consensus pattern: Q-x{3)4LIVM]-x(2)-[KR]-x(2)-R-x-F-x-D-G-[UVMJ-Y4LIVMl.x(2)-[KRJ 
(1] Suzuki K., Olvera J.. Wool I.G. Gene 93:297-300(1990). 

[2) Schwank S., Harrer R., Schueller H.-J., Schweizer E. Curr. Genet. 24:135-140(1993). 

[3] Golden B.L, Ramakrishnan V, White S.W. EMBO J. 12:4901-4908(1993). 

[4] Otaka E., Hashimoto T. Mizuta K.. Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 

[1375] 534. Ribosomal protein L6e signature 

A number of eukaiyotk: and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities 
One of these families consists of: 



45 



so 



- Mammalian ribosomal protein 1^ (L6 was previously known as TAX-responsive enhancer element binding protein 

- Caenorhabdftis elegans ribosomal protein L6 (R151.3). 

- Yeast ribosomal protein YL1 6A/YL1 SB. 
Mesembryanthemum crystallinum ribosomal protein YL1 6-like. 

These proteins have 175 (yeast) to 287 (mammalian) amino ackJs. A highly consented regton in the central part of 
these proteins has been selected as a signature pattern. 

- Consensus pattern: N-x(2)-P-L-R-R-x(4)-[FYJ-V-l-A-T-S-x-K 
[1 376] 535. Ribosonr«l protein L7 Ae signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 
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A conserved region in the central part of these proteins has been selected as a signature pattern. 

- Consensus pattern: P-Y-E-{KRl-R-x-(UVMHDEHUVMJ(2)4KRl 

[ 1] Chan Y -L. Paz V.. Olvera J.. Wool I.G. 

Biochem. Btophys. Res. Commun. 192:649-853(1993). 

[1366] 526. RIbosomal protevi L39e signature 

A number of eukaryotic and archaebacterial ribosbmal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Mammalian L39 (1 J. - Plants L39. - Yeast L46 [2]. - Archebacterial L39e [3). These protehs are very basic. About 
50 residues long, they are the smallest proteins of eukaryotfc-type ribosomes. A conserved region in the C-lemiinal 
section of these proteins has been selected as a signature pattern. 

- Consensus pattern: [KRA]-T-x(3)'[UVMJ-(KRQF]-x-lNHSJ-x(3)-R-{NHYl-W-R-R 

I IJ Lin A., MclMally J., Wool I.G. J. Biol. Ghem. 259:487-490(1984). 
1 2] Leer R.J., van Raamsdonk-Duin M.M.C.. Kraakman P., Mager WH.. 
Planta R.J. Nucleic Acids Res. 13:701-709(1985). 
( 3J Ramirez C, Louie K.A., Matheson A.T. FEBS Lett. 250:416-418(1989). 

[1367] 529. Ribosomal L40e family 

Bovine L40 has been identified as a secondary RNA binding protein [1]. L40 is fused to a ubiquitin protein [2]. 
Number of members: 27 

[1] 

Mediine: 88203200 

RNA binding proteins of the large suburiit of bovine mitochondrial ribosomes. 
Piatyszek MA, Denslow ND, O'Brien TW; ^ 
Nuciek: Acids Res 1986;16:2565-2583. 

[2]MedIine: 9601 1 832 The carboxyl extensfons of two rat ubkiuitin f uston proteins are ribosomal proteins S27a and 
L40. 

Chan YU Suzuki K. Wool IG; 
Biochem Biophys Res Commun 1995;215:682-690. 
[1 368] 530. (Ritjosomal L44) Ribosomal protein L44e signature 

A number of eukaryotfc and archaebacterial ribosomal proteins can be grouped on the basts of sequence similarities. 
One of these families cor)sists of: 

Mammalian L44 [1 J. - Trypanosoma brucei L44. 
Caenorhabditis elegans L44 (C09H10.2). - Fungal L44 (L41 ). 
Halobacterium marismortui LA [2]. 

These proteins have 92 to 105 amtno-ackj reskiues. 

A conserved region kscated in the C-terminal part of these proteins has been selected as a signature pattern. 

- Consensus pattern: K-x-{TV]-K-K-x(2)-L-[KR]-x(2)-C 

( 1| Gallagher M.J.. Chan Y-L., Lin A., Wool I.G. DNA 7:269-273(1988). 
( 21 Bergmann U., Wrttmann-UebokJ B. 
Biochim. Biophys. Acta 1173:195-200(1993 

[1369] 531. Ribosomal protein L5 signature 

Ribosomal protein L5 is one of the proteins from the large ribosomal subunit. In Escherichia cofi. LS is known to be 
involved in binding 5S RNA to the large ribosomal subunit. It betongs to a family of ribosomal proteins whk:h. on the 
basis of sequence similarities [1 ,2,3,41. groups: - Eubacterial L5. 

- Algal chloroplast L5- - Cyanelle L5. - Archaebacterial L5. - Mammalian L11. 

- Tetrahymena themnophila L21. - Slime mold L5 (V18). - Yeast L16 (39A). 
Plants mitochondrial L5. 
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sequence similarities, groups: - Eubacterial L34. 

Red algal chloroptast L34. - Cyanelle L34. 
A conserved region that corresponds to the N-terminal half of L34 has been selected as a signature pattern. 

- Consensus pattern: K-[RG]-T-[FYWL]-[EQSl-x(5HKRHSl-x(4,5)-G-F-x(2)-R 

[ 1] Old I.G.. Margarita D.. Saint Girons L 
Nucleic Acids Res. 20:6097-6097(1992). 
[1362] 524. Ribosonnal protein L34e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

Mammalian L34. - Mosquito LSI [1]. - Plant L34 f2]. 

- Yeast putative ribosomal protein YIL052c. - Methanococcus jannaschii MJ0655. These proteins have 89 to 129 
amino-acid residues. 

A consented region located in the N-terminal section of these proteins has been selected as a signature pattern. 

- Consensus pattern: Y-x-IST]-x-S-[NY]-x{5)-[KR]-T-P-G 

[ 1] Lan Q., Niu LL, Fallon A.M. 
Biochim. Biophys. Acta 1218:460-462(1994). 
( 2) Gao J.. Kim S.R. Chung Y.Y, Lee J.M.. An 6. 
Plant Mol. Biol. 25:761-770(1994). 

[1363] 525. Ribosomal protein L35Ae signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Vertebrate L35A. - Caenorhabditis elegans L35A (F10E7.7). 

- Yeast L37A/L37B (Rp47). - Pyrococcus woesei L35A homolog [1]. 

These proteins have 87 to 110 amino-acid residues. 

A highly consen/ed stretch of 22 residues in the C-terminal part of these proteins has been selected as a signature 
pattem. 

- Consensus pattem: G-K-[UVMhx-R-x-H-G-x(2)-G-x-V-x-A-x-F-x(3)-(LI]-P 

[ 1} Ouzounis C, Kyrpides N.. Sander C. 
Nucleic Acids Res. 23:565-570(1995). 
[1364] 526. Ribosomal protein L36 signature 

Ribosomal protein L36 is the smallest protein from the large subunit of the prokaryotic ribosome. It betongs to a family 
of ribosomal proteins which, on the basis of sequence similarities [1], groups: - Eubacterial L36. - Algal and plant 
chloroplast L36. - Cyanelle L36.L36 is a small basic and cysleine-rich protein of 37 amino-acid residues. Asa signature 
pattem. a conserved region that corresponds to positions 1 1 to 36 in L36 and includes three consented cysteine residues 
has been developed. 

Consensus pattem: C-x{2)-C-x(2)-[LIVM]-x-R-x(3)-ILIVMN]-x-[LIVM).x-C-x(3.4)-[KRJ-H-x-ax-Q- 
f 1] Otaka E.. Hashimoto T, Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 
[1365] 527. Ribosomal protein L36e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. One of these families 
consists of: - Mammalian L36 [1]. 

- Drosophila L36 (M(1 )1 B). - Caenorhabditis elegans L36 (F37C1 2.4). 

- Candida albicans L39. - Yeast YL39. 



These proteins have 99 to 104 amino acids. 
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Drosophlla L7. - Slime moW L7. - Mammalian L7. - Fungi L7 (YLB). 
Yeast mitochondriat L33. 



L30 from eubacteria are small proteins of about 60 residues, those from archaebacteria are proteins of about 150 
residues. Eukaryotic L7 are proteins of about 250 to 270 residues. The schematic relationship between the three groups 
of proteins is shown betow.Eub. L30 fslxxxxxxxxxxC 
Arc. L30 f^lxxxxxxxxxxxxxxxxxxxxxxxxxxxC 

L7 NxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxC position of the pattern. 

The signature pattern for this family of ribosomal proteins spans the N-terminal haH of the region common to all these 
proteins. 

. Consensus pattern: [IVTHUVMhx(2HLF]-x-lLIJ-x-[KRHQEGl.x(2)-ISTNQHhx-{l\^^ 

VAhx(2HLMFY]-[IVT] J i rv ;i jiuvj;ikmu 

[ 1] Mizuta K., Hashimoto T, Otal^a E. 
Nucleic Acids Res. 20:1011-1016(1992). 
[1357] 520. Ribosomal protein LSI signature 

Ribosomal protein LSI ^ one of the proteins from the large ribosomal subunit. L31 is a protein of 66 to 97 amino^cid 

residues which has only been found so far in eubacteria and in some algai chloroplasts. 

A consented region located in the central section of these proteins has been selected as a signature pattern. 

- Consensus pattern- H-P-F-(FYh[nhx(9)-G -R-[AI V]-x-|KRQ] 
[1 358] 521 . RibosonDal protein LSI e signatu re 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities 
One of these families consists of: 

- Mammalian LSI [1], - Chlamydomonas reinhardtil LSI . - Yeast LS4. 
• Halobacterium marismortui HL30 [2]. 

These proteins have 87 to 128 amino-acid residues. 

A consented region, located in the central section has been selected as a signature pattem. 

- Consensus pattem: V-(KRHLIVM]-x(3)-[LIVM]-N-x-[AKH]-x-W-x-[KR]-G 

AmdlT'^ ^" Kuwano Y. Kuzumaki T. Ishikawa K.. Ogata K. Eur, J. Blochem. 1 62:45-48(1 987).t 2J Bergmann U. 

Biochim. Biophys. Acta 1050:56-60(1990). 
[1359] 522. Ribosomal protein L33 signature 

Ribosomal protein L33 is one of the proteins from the large rtoosomal subunit. In Escherichia cofi. LS3 has been shown 
to be on the surface of 50S subunit L3S belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1 .2.3], groups: - Eubacterial LS3. 

- Algal and plant chloroplast L3S. - Cyanelle L3S. 

L33 is a small protein of 49 to 66 amino-acid residues. A consented regbn located in the central section of L33 has 
been selected as a signature pattem. 

- Consensus pattem: Y-x-[ST|-x-(KRh[NS]-x(4)-[PATQ]-x(1 ,2).[LI VMHEA]-x(2)-K-lFYl-lCSDJ 

[ 1) Kruft v., Kapp U., Wittmann-Liebold B. Biochimie 73:855-860(1991). 
[ 2] Sharp RM. Gene 139:129-130(1994). 
[ 3] Otaka E., Hashinwto T. Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1 360] 523. Ribosomal protein L34 signature 

[1 361] Ribosomal protein L34 is one of the proteins from the large subunit of the prokaryolic ribosome It is a small 
basic protein of 44 to 51 amino-acid residues [1 J. L34 betongs to a family of ribosomal proteins which, on the basis of 



4 
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The schematic relationship between these groups of proteins is shown beiow. Eub. L27 NxxxxxxxxxAigal L27 
Nxxxxxxxxx 

Plant L27 tttttNxxxxxxxxxxxxx 

Yeast MRP7 tttNxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 
transit peptide. 

•N*: N-terminal of mature protein.***: position of the pattern. 

- Consensus pattem: G-x-[LIVMJ(2)-x-R-Q-R-G-x(5)-G 

[ 1] Elhag G.A., Bourque D.P. Biochemistry 31:6856-6864(1992). 
[ 2) Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1 353] 516. Ribosomal t^8 family 

The ribosomal 28 family includes L28 proteins from bacteria and chloroplasts. The L24 protein from yeast Swiss: 
P36525 also contains a region of similarity to prokaryotic L28 proteins. L24 from yeast is also found in the large ribos- 
omal subunit 

Number of members: 24 

[1 354] 517. Ribosomal protein L29 signature 

Ribosomal protein L29 is one of the proteins from the large ribosomal subunit. L29 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities [1], groups: - Eubacterial L29. - Red algal L29. 

- Archaebacterial L29. - Mammalian L35 - Caenorhabditis elegans L35 (ZK652 4) 

- Yeast L35. 



L29 is a protein of 63 to 138 amino^cid residues. 

A conserved region located in the central section of L29 has been selected as a signature pattem. 

- Consensus pattem: lKNQSl-[PSTL]-x(2)-lLIMFAl-[KRGSANJ-x-[LIWSTAl.(KRhIKRHQSHDESTANRL]4LIVl-A- 
[KRCQVTHUVMAJ ^ 

1 1] Otaka E., Hashinx>to T. Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 
[1355] 518. Ribosomal protein L3 signature 

Ribosomal protein L3 is one of the proteins from the large ribosomal subunit. In Escherichia coll. L3 Is known to bind 
to the 23S rRNA and may participate in the formation of the peptidyltransf erase center of the ribosome. It belongs to 
a family of ribosomal proteins which, on the basis of sequence similarities [1.2.3.4J. groups: - Eubacterial L3 - Red 
algal L3. - Cyanetle L3. 

Archaebacterial Halobacterium marismortui HmaL3 (HL1 ). 

Yeast L3 (also known as trichodermin resistance protein) (gene TCM1 ). 

- Arabidopsis thaliana L3 (genes ARPI and ARP2). - Mammalian L3 (L4). 

- Mammalian mitochondrial L3. - Yeast mitochondrial YmL9 (gene MRPL9). A conserved region located in the central 
section of these proteins has been selected as a signature pattern. 

- Consensus pattem: [FLJ-x(6)-[DN].x(2)-[A6Sl-x-(ST]-x-G-[KRH]-G-x(2)-G-x(3)-R 

[ 1] Arndt E.. Kroemer W., Hatakeyama T J. Biol. Cham. 265:3034-3039(1990). 

[ 2] Graack H.-R.. Grohmann L. Kitakawa M., Schaefer K.L, Kruft V. 

Eur. J. Biochem. 206:373-380(1992). 

( 3J Henvig S., Kruft V. Wittmann-Liebold B. 

Eur. J. Bfochem. 207:877-885(1 992). 

( 4] Otaka E.. HashioKJto T. Mizuta K., Suzuki K. 

Protein Seq. Data Anal. 5:301-313(1993). 

[1356] 519. Ribosomal protein L30 signature 

Ribosomal protein L30 is one of the proteins from the large libosomal subunit. L30 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities [1], groups: - Eubacterial L30. - Archaebacterial L30. 
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a specific region on the 23S rRNA; in yeast, the corresponding protein binds to a homologous site on the 26S rRNA 
llj. It belongs toa family of n-bosomal proteins vi^hich. on the basts of sequence similarities [2.3.4], groups: • Eubacteriai 
L23. 



■ Algal and plant chloroplast L23. - Archaebacterial L23. - Mammalian L23A. 

- Caenorhabditis elegans L23A {F55D10.2). - Fungi L25. 

- Yeast mitochondrial YmL41 (gene MRPL41 or MRP20). 

[1349] A small conserved region In the C-tenminal section ol these proteins, which is probably involved in rRNA- 
binding has been selected as a signature pattern (2J. 

- Consensus pattern: fRK](2HAMHIVFYT]4IN0-[RKl>L4STANEQK]-x(7HU\^FT^ 

1 1 J El Baradi TTA.L, Raue H.A.. van de Regt C.KF., Verbree E.G.. 
Planta RJ. EMBO J. 4:210-2107(1985). 

1 2] Raue HA. Otaka E., Suzuki K. J. Mol. Evol. 28:418-426(1 $89). 
[ 3J Fearon IC. Mason XL J. Biol. Chem. 267:5162-5170(1992)! 
( 4J Otaka E.. Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1350] 513. Ribosomal protein L24 signature 

Ribosomal protein L24 is one of the proteins from the large ribosomal subunit. L24 betongs to a family of ribosomal 
proteins which, on the basis of sequence similarities, groups: - Eubacteriai 1^4. 

- Plant chloroplast L24 (nuclear-encoded). - Red algal L24. - Vertebrate L26 

- Yeast L26 (YL33). - Archaebacterial HmaL24 (HL1 5). 

- A probable ribosomal protein from Sulfotobus acidocaldarius [1J. 

In their mature form, these proteins have 103 to 160 amino^cid residues. 

A conserved stretch of 20 residues in their N-terminal sectbn has been selected as a signature pattern. 

- Consensus pattern: [GDEN]-D-x.V.x-IIVJ-[LIVMAl.x-G.x(2)-[KRAHGNQ]-x(2.3)-(GA]-x4IVl 

[ 1) Ouzounis C, Kyrptdes N., Sander G. 

Nucleic Acids Res. 23:565-570(1995). 

[1351] 514. Ribosomal protein L24e signature 

A number of eukaryotb and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities 
One of these families consists [ 1 ] of : h • « wi »io;>. 

Mammalian ribosomal protein L24. 

Yeast ribosomal protein L30A/B (Rp29) (YL21). 

Kluyveromyces lactis ribosomal protein L30. 

- Arabidopsis thatiana ribosomal protein t_24 homobg. 
Haloarcula marismortui riboson>al protein HL21/HL22. 
Methanococcus jannaschii MJ1 201 . 

^fT'"' aminc^cid residues. The most conserved region, which is tocated in the N-temiinal 

region of these proteins has been selected as a signature pattern. 

- Consensus pattern: [FYI-x-[GSH]-x(2)-IIVJ-x-P.G.x-G-x(2)-[FYV]>x-(KRHEJ-x-D 

[ 1] Chan Y-L, Olvera J., Wool LG. Biochem. Bbphys. Res. Commun. 202:1176-1180(1994). 
[1352] 515. Ribosomal protein L27 signature 

Ribosomal protein L27 is one of the proteins from the large ribosomal subunit. L27 belongs to a family ot ribosomal 
proteins which, on the basis of sequence similarities (1 .2], groups: - Eubacteriai L27. 

- Plant chloroplast L27 (nuclear-encoded). - Algal chtoropiast L27. 
Yeast mitochondrial YmL2 (gene MRPL2 or MRP7). 
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- Consensus pattern: K-x{3)-IKRCl-x-[UVM]-W-(IVHSTNALV]-R-[LIVMHNS]-x(3)-[RKHSJ 

( 1] Otaka E., Hashimoto T, Mizuta K., Suzuki K. 

Protein Seq. Data Anal. 5:301 -31 3(1 993). 
[1345] 509. Ribosomal protein L21e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence simitarrties. 
One of these families consists of: 



- Mammalian L21 (IJ. - Entamoeba histolytica L21 [2J. 

- Caenorhabditis elegans L21 (C14B9.7). - Yeast L21E (URP1) [3]. 
Halobacterium marismortui HL31 (4]. 



These proteins have 160 (eukaryotes) or 95 (archebacteria) amino-acid residues. A consented region in the central 
part of these proteins has been selected as a signature pattern. 

- Consensus pattern: G-[DEhx-V-x(1 0)-[G V]-x(2)-[FYH]-x(2)-(FY]'X-G-x-T-G 

[ 1] Devi K.R.G., Chan Y.-L. Wool I.G. 
Biochem. Biophys. Res. Commun. 162:364-370(1989). 
[ 2] Patter R.. Rozenblatt S.. NuchaiDowitz Y.. Mirelman D. 
Mot. Biochem. Parasltol. 56:329-333(1992). 

[ 3] Jank B., Waldherr M., Schweyen R.J. Curr. Genet. 23:15-18(1993). 
[ 4] Hatakeyama T., Kimura M. Eur. J. Biochem. 172:703-711(1988). 

[1346] 510. Ribosomal protein L21 signature 

Ribosomal protein L21 is one of the proteins from the large ribosomal subunit. In Escherfchia coli, 1^1 is known to bind 
to the 23S rRNA in the presence of 120, It bebngs to a family of rtoosomal proteins which, on the basis of sequence 
similarities, groups: - Eubacterial 1^1. 

Marchantia polymorpha chloroplast l_21 . - Cyanelle L21. 

- Spinach chtoroplast 1^1 (nuclear-encoded). 

Eubacterial L21 is a protein of about 100 amino-ackJ residues, the mature form of the spinach chloroplast L21 has 200 
residues. A conserved region located in the C-lemriinal section of these proteins has been selected as a signature 
pattern. ^ 



Consensus pattern: llVT]-x(3)-(KRhx(3)-(KRQ].K-x(6)-G-[HF]-R.lRQ].x(2)-[ST] 
[1 347] 51 1 . Ribosomal protein L22 signature 

Ribosomal protein 1^2 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L22 is known to bind 
23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities 11 2 3] qrouos' - 
Eubacterial L22. i » » j' » k • 



- Algal and plant chtoroplast 1.22 (in legumes L22 is encoded in the nucleus instead of the chtoroplast) - Cyanelle 
L22. - Archaebacterial L22. 

- Mammalian LI 7. - Plant LI 7. - Yeast YL17. 

A consen/ed region tocated in the C- terminal section of these proteins has been selected as a signature pattern. 

- Consensus pattern: [RKQN]-x(4)-[RHHGAS]-x-G-IKRQS]-x(9)-[HDN]-[LIVMl-x-[LIVMS]-x-(LIVMl 
[ 1] Gantt J.S., Baldauf S.L, Calie RJ., Weeden N.F., Palmer J.D. 

EMBO J. 10:3073-3078(1991).(2] Madsen LH.. Kreiberg J. D.. Causing K. Curr. Genet. 19:417-422(1991) 
[ 3) Otaka E., Hashimoto T, Mizuta K., Suzuki K. 
Protein Seq. Data Anal. 5:301-313(1993). 



[1 348] 512. Ribosomal protein L23 signature 

Ribosomal protein L23 is one of the proteins from the large ribosomal subunit. In Escherichia coli. L23 is known to bind 



( 
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[1 341] These proteins have 1 48 to 203 amtno^cid residues. 

A stretch of about 20 residues in the N-temninal part of these proteins has been selected as a si^ature pattern. 

- Consensus pattern: CHKRhR-[UVMl-x4SA)-x(4)-ICV]-G-x(3HI\0-{WKl-(UVF]-{DN}-P 

I IJ Chan Y.-L. Un A., McNally J., Peleg D.. Meyuhas O.. Wool I.G. 
J. Biol. Chem. 262:1111-1115(1987).[ 2] Hart K.. Klein T, Wilcox M. 
Mech. Dev. 43:1 01 -110(1 993).[ 3] Singleton C.K.. Manning S.S., Ken R. 
Nucleic Acids Res. 17:9679-9692(1989). 
[1342] 506. Ribosomal protein LI e signature (Ribosomal_L4) 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists {1,2,3, 4J of: - Vertebrate LI (L4), - Drosophlla LI. - Plant LI. - Yeast L2 (Rp2). 

- Fission yeast L2. - Halobacterium marisnrK>rtui HmaL4 (HL6). 
Methanococcus jannaschii MJ01 77. 

These proteins have 246 (archaebacteria) to 427 (human) amino acids. A conserved region in the N4erminal part of 
these proteins has been selected as a signature pattern. 

- Consensus pattern: N-x{3)-IKRMl-x(2)-A-lLIVTJ-x-S-A-(UV)-x-A-[STl-[SGA]-x(7)-lRKHGSJ-H 

[ 1] Rafti R. Gargiulo G., Manzi A., Malva C, Graziani F. 

Nucleic Acids Res. 1 7:456-456(1 989).[ 2] Presutti C, Villa T. Bozzoni I. 

Nucleic Acids Res. 21:3900-3900(1993). 

I 3) Bagni C, Mariottini R. Annesi R, Amaldi R 

Biochim. Blophys. Acta 1216:475-478(1993). 

[ 3] AnrKtt E.. Kroemer W.. Hatakeyama J. J. Biol. Chem. 265:3034-3039(1990). 
[1 343] 507. Ribosomal protein L2 signature 

Ribosomal protein L2 is one of the proleins from the large ribosomal subunit. In Escherichia coli, L2 is known to bind 
to the 23S rRNA and to have peptidyltransf erase activity. It belongs to a family of ribosomal proteins which, on the 
basis of sequence similarities (1 ,2). groups: - Eubacterial L2. 

- Algal and plant chloroplast L2. - Cyanelle L2. - Archaebacterial L2. 

- Plant L2. - Slime moW L2. - Marchantia polymorpha mitochondrial L2. 
Paramecium tetraurelia mitochondrial L2. - Fission yeast K5. K37 and KD4. 

- Yeast YL6. - Vertebrate L8. 

The best consented region located in the C-terminal section of these proteins has been selected as 
a signature pattern. 

' Consensus pattern: P-x(2)-R-G-[STAIVJ(2)-x-N-(APK]-x-[DE] 

{ 1] Marty I., Meyer Y 

Nucleic Acids Res. 20:1517-1522(1992). 

[ 2) Otaka E., Hashimoto T, Mizuta K., Suzuki K. 

Protein Seq. Data Anal. 5:301-313(1993). 

[1344] 508. Ribosomal protein L20 signature 

Ribosomal protein L20 is one of the proteins from the large ribosomal subunit. In Escherichia coli. L20 is known to bind 
directly to the 23S rRNA. It betongs to a family of ribosomal proteins which, on the basis of sequence similarities (1 J, 
groups: - Eubacterial L20. - Algal and plant chtoroplast L20. 

Cyanelle L20. 

L20 is a protean of about 120 amino-acid residues. A conserved region located in the central sectbn of these proteins 
has been selected as a signature pattern. 
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[1336] 501. Ribosomal protein LI 7 signature 

Ribosonnal protein LI 7 is one of the proteins from the large ribosomal subunit. LI 7 belongs to a family of ribosomal 
proteins which, on the basis of sequence simtlarittes, groups: - Eubacterial LI 7. 

Yeast mitochondrial YmL8 (gene MRPL8). 

E ubacterial LI 7 is a protein of 1 20 to 1 30 amino-acid residues. Yeast YmL8 is twice larger (238 residues), the sequence 
of its N-terminal half is colinear with that of eubacterial LI 7. As a signature pattern, a conserved region In the N-terminal 
section was selected, 

- Consensus pattern: l-x-[ST]4GT]-x(2)-[KR]-x-K-x(6)-[DE]-x-[LIMV]-{LIVMT]-T-x-[STAGHKRJ 
[1337] 502. Ribosomal protein L18e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Vertebrate LIB (known as L14 in Xenopus) [1]. - Plant L18. 
Yeast L18 (Rp28). - Halobaclerium marismortui H129. 
Sulfolobus acidocatdarius H129e. 

These proteins have 115 to 187 amino-acid residues., A stretch of about 13 residues in the first third of these proteins 
has been selected as a signature pattern. 

- Consensus pattern: [KRE]-x-L-x(2)-(PSHKR]-x(2)4RHl-(PSA)-x-[UVMJ-[NS]-[LIVM]-x-IRKJ-[LIVM] 

[ 1] Puder M., Barnard G.R. Staniunas R.J., Steele G.D. Jr., Chen LB. 
Biochim. Bbphys. Acta 1216:134-136(1993). 
[1 338] 503. Ribosomal LI 8p family 

It has been shown that the amino temninal 93 amino acids of Swiss:P09895 are necessary and sufficient to bind 5S 
rRN A in vitro. The carboxyl-terminal half of the protein, comprising amino acids 1 51 -296, senses to localize the protein 
to the nucleolus [1]. 
Number of members: 26 

[1] 

Medline: 96212235 

Distinct domains in ribosomal protein L5 mediate 5 S rRNA binding and nucleolar localization. 
Michael WM, Dreyfuss G; 
J Biol Chem 1996;271:11571-11574. 
[1 339] 504. Ribosonnal protein LI 9 signature 

Ribosomal protein LI 9 is one of the proteins from the large ribosonnal subunit. In Escherichia coli. Li 9 is known to be 
located at the 30S-50S ribosomal subunit interface and may play a role in the structure and function of the aminoacyl- 
tRNA binding site. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups: - 
Eubacterial LI 9. 

- Red algal chloroplast LI 9. - Cyanelle LI 9. 

LI 9 is a protein of 1 20 to 1 30 amino-acid residues.. 

A conserved region in the C-terminal section has been selected as a signature pattern. 

- Consensus pattem: [LI VM)-x-[KRGTIl-x-[GSAIHKRQDAHVG]-[RSNJ-X(0.1 HKRHSA1-[KY]-(KLI1-(LYS1-Y-[LIM]- 
R 

[1340] 505. Ribosomal protein L19e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Mammalian ribosomal protein LI 9 ( 1 ]. - Drosophila ribosomal protein LI 9 [2]. 
Slime mold (D. discoideum) vegetative specific protein VI 4 [3]. 

- Yeast ribosomal protein LI 9 (YL14). - Archebacterial ribosomal protein L19E. 
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[1328] ( 1J Chan Y.-L. 0»vera J., Glueck A.. Wool I.G. J. Biol. Chem. 269:5589-5594(1994) 
[1 329] 497. Fkibosomal proten LI 3e signature 

A number of eukaryotic rtoosomal proteins can be grouped on the basis ol sequence similarities 111 One of these 
families consists of: 

- Vertebrate LIS (was previously known as Breast Basic Consented protein 1 (BBC1)). - Drosophila LI 3 - Plant 
L13. - Yeast probable L13 (YM9375.11C). 

These proteins have 1 99 to 218 amino-ackJ rescues. As a signature panem, a stretch of about 16 residues in the first 
third of these proteins selected. 

- Consensus pattern: [KRhY-x(2)-K-[UVMJ-R4STAhG4KR]-G-F-[S"n-L-x-E 

[1330] [ 1] Olvera J., Wool I.G. Bk)chera Bkjphys. Res. Commua 201:102-107(1994). 
[1 331] 498. Rft)osomal protein LI 4 signature 

Ribosomal protein LI 4 is one of the proteins from the large ribosomal subunil. In eubacteria. LI 4 is known to bind 
directly to the 23S rRNA. It betongs to a family of ribosomal proteins which, on the basis of sequence similarities [11 
groups: - Eubactenal L14. - Algal and plant chtoroplast L14. - Cyanelle L14. - Aichaebacterial L14 - Yeast L17A. 
Mammalian L23. 

- Gaenorhabditis elegans L23 (80336,10). - Higher eukaryotes mitochondrial L14. 

- Yeast mitochondrial Yml38 (gene MRPL38). 

LI 4 is a protein of 1 1 9 to 1 37 amino-acid residues. As a signature pattern, a conserved region located in the C-termlnal 
half of these proteins was selected: 

• Consensus pattern: [GAHLI VE(3)-x(9. 1 0)-{DNS].G-x(4).[F Y].x(2)-[NT]-x(2)- V-[Li VJ 

[1332] [ 1] otaka E.. Hashimoto T. Mizuta K.. Suzuki K. Protein Seq. Data Anal 5-301-313(1993) 
[1 333] 499. Ribosomal protein LI 5 signature 

Ribc«(xnal protein LI 5 is one of the proteins from the large ribosonnal subunil. In Escherichia coli. LI 5 is known to bind 
me 23S rRNA. It bebngs to a family of ribosomal proteins which, on the basis of sequence similarities [11 groups- - 
Eubactenal L15. - Plant chtoroplast LIS (nuclear-encoded). 

- Archaebacterial LI 5. - Vertebrate L27a - TeUahymena thermophila L29. 

- Fungi L27a (L29. CRP-1 , CYH2). 

L15 is a protein of 144 to 154 amino-ackJ residues. As a signature pattern, a consented region was selected in the C- 
terminal section of these proteins. 

A-X(o)-[LI V M]-x(3)-G 

[1334] 1 1) Otaka E.. Hashimoto T. Mizuta K., Suzuki K. Protein Seq. Data Anal. 5 301-313(1 993) 
[1335] 500. Ribosomal protein L15e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities 
n j. One of these families consists of: 

- Mammalian LIS. - Insect LIS. - Plant LIS. - Yeast YL10 (L13) (RplSr). 
Thermoplasma acidophilum LI 5. 

These proteins have about 200 amino acid residues. As a signature pattern, a conserved region was selected located 
in the central section. 

- Consensus pattern: IDEHKR]-A-R-x-L-G-[FY]-x.(SAPhx(2)-G-[LIVMFYl(4).R-x.R-[IV]-x-R.G 

f 1 J ZwkikI P. Lupas A, Baumeister W. 

Biochem. Biophys. Res. Commun. 209:684-688(1995). 
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It is located at the end of an alpha helbc thought to be Involved rn RNA-blndlng. 

[1314] Consensus pattern: llM].x(2)-[UVA}.x(2.3)-[LIVMhG-x(2)-[LMS]-[GSNHHPTKRHKRAVl-G-x4UMn.p- 
[DENSTKQ) 

(1) NIkonov S.V.. Nevskaya N., Eliseikina I A. Fomenkova N.R, Nikulin A., Ossina N.. Garber M., Jonsson B.-K. 
Briand C, Al-Karadaghi S.. Svensson LA.. Aevarsson A., Llljas A. EMBO J. 1 5: 1 350-1 359(1 996). 
[ 2] Olvera J.. Wool I.G. 2.aCO:2-'Biochem. Biophvs. Res. Commun. 220:954-9570 996V 

[1315] 492. Ribosomal protein LI 0 signature 

Ribosomal protein L10 is one of the proteins from the large ribosomal subunit. L10 is a protein of 162 to 185 amino- 
acid residues whfch has only been found so far in eubacteria. A consented region located In the N-terminal section of 
these proteins was used as a signature pattern. 

[1316] Consensus pattern: [DEH]-x(2HGS]-[LIVMF}-[STN]-[VA]-x-IDEQKl-(LIVMAJ-x(2)-(UMJ-R 
[1317] 493. Ribosomal protein LlOe signature 

A number of eukaryotk: and archaebacterial ribosonral proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: - Vertebrate L10 (QM) [1]. - Plant L10. - Caenorhabditis elegans LIO (F10B5.1). - 
Yeast LIO (QSR1). - Methanococcus jannaschii MJ0543. These proteins have 174 to 232 amino^cid residues. A con- 
served region kx:ated in the central section was selected as a signature pattern. 
[1318] Consensus pattern: R-x-A-(FYW]-G-K-[PA]-x-G-x(2)-A-R-V 

[ 1] Chan Y-L. Diaz J.-J., Denoroy L, Madjar J.-J.. Wool I.G. 2.aCO:2''Biochem. Bbohvs. Res r nmmnn 255* 

952-956(1996). ' '' 

[1319] 494. Ribosomal protein L11 signature 

[1320] Ribosomal protein L11 is one of the proteins from the large ribosomal subunit. In Escherrchia coli, L11 is 
known to bind directly to the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities (1,2]. groups: 



Eubacterial L11. 

- Plant chloroplast L11 (nuclear-encoded). 

- Read algal chloroplast L1 1 . 

- Cyanelle L11. 
Archaebacterial LI 1 . 
Mammalian LI 2. 

- Plants LI 2. 

- Yeast LI 2 (YL1 5). 



[1321] LI 1 is a protein of 140 to 165 amino-acid residues. A conserved region located in the C-terminal section of 
these proteins was selected as a signature pattern. In Escherichia coli, the C-terminal half of L11 has been shown (3] 
to be in an extended and kx>sely folded conformation and is likely to be buried within the ribosomal structure 
[1322] Consensus pattern: [RKN]-x-[LIVfsfl]-x-G-(ST]-x(2HSNQJ-(UVM)-G-x(2)-IUVI^]-x(0.1 )4DENG] 

( 1] Pucciarelli G., RenDacha M.. Ballesta J.RG.; Nucleic Acids Res. 18:4409-4416(1990). 
[ 2J Otaka E.. Hashimoto T. Mizuta K.. Suzuki K.; Protein Seq. Data Anal. 5:301-313(1993). 
[ 3] Choli T. Biochem. Int 19:1323-1338(1989). 

[1323] 495. Ribosomal protein L7/L12 C-terminal domain 
[1324] [1] Leijonmarck M, Liljas A; J Mot Biol 1987:195:555-579. 
[1325] 496. Ribosomal protein LIS signature 

Ribosomal protein LI 3 Is one of the proteins from the large ribosomal subunit. In Escherichia coli. LI 3 is known to be 
one of the early assembly proteins of the 50S ribosomal subunit. It belongs to a family of ribosomal proteins which, on 
the basis of sequence similarities [1], groups: - Eubacterial L13. 

- Plant chloroplast L1 3 (nuclear-encoded). - Red algal chloroplast LI 3. 

- Archaebacterial LI 3. - Mammalian LI 3a (Turn Pi 98). - Yeast Rp22 and Rp23. 

[1 326] LI 1 Is a protein of 1 40 to 250 amino-acid residues. As a signature pattern, a consented regk>n was selected 
located in the C-terminal sectk>n of these proteins. 

IGDN] ''-'^^HKRV]-[GK]-M.(LIV]-[PSl-x(4,5)-|GS]-[NQEKRAl-x(5)-(LIVMl.x-[AIV]-[LFY 
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Rhodanese (thiosutfate suHurtransf erase) (EC 2.8.1.1) |1,2] is an enzyme which catalyzes the transfer of the sulfane 
atom of thiosulfate to cyanide, to iorm sulTite and thiocyanate. In vertebrates, rhodanese is a mitochondrial enzyme of 
about 300 amino-acid residues involved in f onming iron-sulfur conplexes and cyanide detoxffication. A cysteine residue 
takes part m the catalytic mechanism. Some bacterial protehs closely related to rhodanese are also thought to express 

s a sulfotransferase activity. These are: - Azotobacter vinelandii rhdA. - Escherichia coii sseA {3J. - Saccharopolyspora 
erythraea cysA [4J. - Synechococcus strain PCC 7942 rhdA [SJ. RhdA is a periplasmic protein probably involved n the 
transport erf sulfur compouncte. Two patterns for the rhodanese family were developed. They are based on highly 
conserved regions, one which is located in the N4erminal region, the other at the C-terminal extremity of the enzyme, 
[1 308] Consensus pattern: [FYhx(3)-H-{LI V]-P-G-A-x(2HU VI^ 

10 Consensus pattern: IFYHDEAFJ-G4SAl-W-x-E-[FYWJ 

[ 1] Westley J. Meth. EnzymoL 77:285-291(1981). 
[ 2J Weiland K.L. Dooley TP. Biochem. J. 275:227-231(1991). 
[ 3] Rudd K.E. Unpublished obsenrations (1993). 
»5 [ 4J Donadio S., Shafiee A., Hutchinson C.R J. Bacteriol. 172:350-360(1990). 

[ 6J Laudenbach D.E., Ehrhardt D.. Green L, Grossman A.a J. Bacteriol. 173:2751-2760(1991). 

[1 309] 489. Rbonuclease III family signature 

Prokaryotic ribonuclease Ml (EC 3.1.26.3) (gene mc) [1] is an enzyme that digests double-stranded RNA. It is Involved 
in the processing of ribosomal RNA precursors and of some mRNAs. RNase lllis evolutionary related [2] to the following 
proteins: - Fission yeast pad. a ribonuclease that probably inhibits mating and meiosis by degrading a specific mRNA 
required for sexual development - Yeast ribonuclease III (gene RNTl). a dsRNA-specific nuclease that cleaves eu- 
karyotic prerlbosomal RNA at var»us sites. - Caenorhabditis elegans hypothetical protein F26E4.13. - Paramecium 
bursaria chlorella virus 1 prc^ein A464R. - Synechocystis strain PCC 6803 hypothetkal protein slr0346. - Fisskxi yeast 
hypothetk»l protein SpAC8A4.08c, a protein with a N4erminal helcase domain and a C-terminal RNase III domain. - 
Caenorhabditis elegans hypothetfcal protein K12H4.8, a protein with the same structure as SpAC:8A4.08c.These pro- 
teins share .regions of sequence similarity; one of whch Is a highly conserved stretch of 9 residues whrch has been 
developed as a signature pattern. 

[1310] Consensus pattem: [DEQHRQ].(LMJ-E-{FYW1-(LV]-G-D-{SARI- 
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1 1) Nashimoto H., Uchida H. Mol. Gen. Genet 201:25-29(1985). 
1 2] Mian I.S. Nuclec AckJs Res. 25:3187-3195(1997). 

[1311] 490. Rieske iron-sulfur protein signatures 

Ubiqulnol-cytochrome c reductase (EC 1.10.2.2) (also known as the bcl complexor complex III) is one of the electron 
transport chains of mitochondria and of some aerobe prokaryotes; it catalyzes the oxidoreductkxi of ubk^uinol and 
cytochrome c. In the chtoroplast of plants and in cyanobacteria plastoquinone-plastocyanin reductase (EC 1.10.99.1) 
(also known as the b6f complex) is functionally similar and catalyzes the oxldoreductk>n of plastoquinol and cytochrome 
f. One of the components of these electron transfer systems is an iron-sulfur protein with a 2Fe-2S cluster, which is 
called the Rieske protein [1.2J. The Rieske protein contains approximately 190 amino acid residues. The iron-sulfur 
cluster Is complexed to the protein through cysteine and histidine reskiues. Two perfectly consented regbns in Rieske 
proteins contains all the reskJuesthat bind the iron-sulfur cluster. Both regrans contain two cysteines and a histkJine. 
The first cysteine and the histkline are 2Fe-2S ligands while the remaining cysteines form a disulfide bond 131. Two 
conserved regbns were selected as signature pattems. 

[1312] Consensus pattem: C-[TK]-H-L-G-C-(UVST] [The first C and the H are 2Fe-2S ligands] [The second C is 
involved in a disulfide bond] t 

Consensus pattem; C-P-C-H-x-JGSAl [The first C and the H are 2Fe-2S ligands] {The second C is involved in a disulfide 
bond] 

[ 1] Gatti FL. Meinhardt S.W., (Dhnishi T. Tzagotoff A J. Mol. Bbl. 205:421-435(1989). 
[ 2) Kallas T. SplHer S., Malkin R. Proc. Natl. Acad. Sci. U.S.A. 85:5794-5798(1988). 
( 3) Iwata S.. Saynovlts M.. Link T.A.. Mchel H. Structure 4:567-579(1996). 

[1313] 491. Ribosomal protein LI signature 

Ribosomal protein LI is the largest protein from the large ribosomal subunlt.ln Escherichia coli, LI is known to bind to 
the 238 rRNA It bekDngs to a family of ribosomal proteins which, on the basis of sequence similarities [1 . 2], groups: 
- Eubacterial LI. - Algal and plant chtoroplast LI. - Cyanelle LI. - Archaebacterlal LI. - Vertebrate LIOa! - Yeast 
SSMl As a signature pattern, the best consen/ed region was selected tocated in the central secton of these proteins. 
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sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. In archaebac- 
teria. there is generally a single form of RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 
polypeptides. A component of 14 to 18 Kd shared by all three forms of eukaryolic RNA polymerases and which has 
been sequenced in budding yeast (gene RPB6 orRP026). in fission yeast (gene rpbS or rpo1 5). in human and in African 
swine fever virus (1] is evolutlonaiy related (2] to archaebacterial subunit K (gene rpoK). The archaebacterial protein 
is colinear with the C-terminal part of the eukaryotic subunit. 
[1 299] Consensus panem: (ST]-x-IF Y]-E-x-{ATl-R-x-[U VM)-{GS Al-x«R-(SA)-x-Q 

[ 1] Lu Z.. Kutish G.F.. Sussman M.D., Rock D.L Nucleic Acids Res. 21:2940-2940(1993). 
1 2J McKune K.. Woychik N.A. J. BacterioL 176:4754-4756(1994). 

[1 300] 483. RNA polymerases L / 1 3 to 1 6 Kd subunits signature 

In eukaryotes. there are three different fomfis of DNA-dependent RNApolymerases (EC 2.7.7.6^ transcribing different 
sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptkJes. In archaebac- 
teria. there is generally a single iom of RNA polymerase whfch also consist of an oligomeric assemblage of 10 to 13 
polypeptWes. tt has been shown that small subunits of about 1 3 to 16 Kd found in all three types of eukaryotic polymer- 
ases are highly conserved. Subunits known to belong to this family are: - Budding yeast RPC19 subunit from RNA 
polymerases I and III (!]. - Budding yeast RPB11 subunit from RNA polymerase II [21. - Mammalian RPB11 (gene 
POLR2K) from RNA polymerase 11. - Caenorhabditis elegans hypothetfcal protein F58A4.9. ■ Methanococcus jannaschii 
RNA polymerase subunit L (gene rpoL). - Sulfobbus ackJocaWarius RNA polymerase subunH L (gene rpoL) (3].As a 
signature pattern a consented regron was selected which is kxaXed at the N-terminal extremity of these polymerase 
subunits; this regfon contains two cysteines that could play a role in the binding of a metal ion. 
[1 301] Consensus pattern: (DE](2)-H-[ST]-[LI VMHGAPJ-N-x(1 1 )-V-x-[FM]-x(2)- Y-x{3)- H-P 

[ 1) Dequard-Chablat M.. Riva M., Carles C. Sentenac A. J. BioL Chem. 266:15300-15307(1991). 
[ 2] Woychik N.A.. McKune K.» Lane W.S.. Young R.A. Gene Expr. 3:77-82(1993). 
[ 3] Langer D. EMBL/GenBank: X70805. 

[1302] 484. RNA polymerases N / 8 Kd subunits signature 

In eukaryotes. there are three different forms of DNA^iependent RNA polymerases (EC 2.7.7.6 ) transcribing different 
sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. In archaebac- 
teria, there is generally a single form of RNA polymerase whk:h also consist of an oligomers assemblage of 10 to 13 
polypeptides. Archaebacterial subunit N (gene rpoN) [1] is a small protein of about 8 Kd, it is evolutionary related [2] 
to a 8.3 Kd component shared by all three forms of eukaryotic RNA polymerases (gene RPB10 in yeast and POLR2J 
in mammals) as well as to Afrban swine fever virus protein CPBOR [3J.As a signature pattern a conserved region was 
selected which is located at the N-terminal extremity of these polymerase subunits; this regkxi contains two cysteines 
that could play a role in the binding of a metal ion. 
[1 303] Consensus pattern: [LI VMF](2)-P-[LIVM]-x-C-F-[ST]-C-G- 

[ 1] Langer D., Main J., Thuriaux P, Zillig W. Proc. Natl. Acad. Sci. U.S.A. 92:5768-5772(1995). 
[ 2J McKune K., Woychik N.A. J. Bacterk>l. 176:4754-4756(1994). 

[ 3] Yanez RJ., Rodriguez J. M., Nogal M.L, Yuste L. EnriquezC, Rodriguez J.R. VinuelaE. Virology 208:249-278 
(1 995). 

[1304] 485. Ribonuclease HII 

[1J Mian IS; Nucleic Acids Res 1997;25:3187-3189. 

[1305] 486. Ribonuclease PH signature 

Prokaryotic ribonuclease PH (EC2.7.7.56) (RNase PH) [1 J is a phosphorolyticexoribonuclease that removes nucleotide 
residues following the -CCA terminus of tRNA and adds nucleotides to the ends of RNA molecules by using nucleoside 
diphosphates as substrates. RNase PH is a conserved protein of about 240 amino-acid reskfues. It is evolutionary 
related to Caenorhabditis elegans hypothetical protein B0564.1.As a signature pattern, the most highly conserved 
region was selected which is boated in the central part of these proteins. 

Consensus sequence: C-(DE]-[UVMJ(2)-Q-(GTAJ-D-G-[SGl-x(2)-[TAJ-A ( 1] Kelly K.O.. Deutscher MP. J Biol Chem 

267:17153-17158(1992). 

[1306] 487. RanBPI domain 

[1] Di Matteo G. Fuschi P. Zerfass K. Morelti S, Ricordy R. Cenciarelli C. Tripodi M, Jansen-Durr P, Lavia P Cell Growth 

Differ 1995;6:1213-1224. 

[1307] 488. Rhodanese signatures 
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[1 291 J 478. (RIP) Shiga/ricin ribosomal inactivating toxins active site signature. A numl>er of bacterial and plant toxins 
act by inhibiting protein synthesis in eukaryotic cells. The toxins of the Shiga and rictn family inactivate 60S ribosomal 
subunits by an N-glycosidic cleavage which releases a specific adenine base from the sugar-phosphate backbone of 
288 rRN A [1 .2.3). The toxins which are known to function in this manner are: - Shiga toxin from Shigella dysenteriae 
[4]. This toxin is composed of one copy of an enzymatically active A subunit and five copies of a B subunit responstole 
for binding the toxin complex to specific receptors on the target cell surface. • Shiga^ike toxins (SLT) are a group of 
Escherichia coll toxins very similar in their stmcture and properties to Shiga toxin. The sequence of two types of these 
toxins, SLT-1 [5J and SLT-2 |6J, is known. - Ricin, a potent toxin from castor bean seeds. RIcin consists of two glyco- 
sylated chains linked by a disulfide bond The A chain is enzymatically active. The B chain is a lectin with a binding 
preference for galactoskles. Both chains are encoded by a single polypeptidic precursoc Ricin is classified as a type- 
II ribosome-inactrvating protein (RIP); other members of this family are agglutinin, also from castor bean, and abrin 
from the seeds of the bean Abrus precatorius (7J. - Single chain ribosome-inactivating proteins (type>l RIP) from plants. 
Examples of such proteins are: barley protein synthesis inhibitors I an6 II, mongolian snake-gourd trichosanthia sponge 
gourd luffin-A and -B, garden four-o'ck)ck MAP, common pokeberry PAP-S and soapwort saporin-6 [7]. All these toxins 
are stmcturalty related. A consented glutamic residue has been implicated [8J in the catalytic mechanism; fl is located 
near a conserved arginine which also plays a role in catalysis [9]. The signature that has been devetoped for these 
proteins includes these catalytic residues. 

[1292] Consensus pattern: IUVhM]-x4UVMSTA](2)-x-E4SAGVl-[STALl-R-[FYHRKNQSl-x- [UVMHEQShx(2)- 
[UVMF] [E and R are active site reskiues]- 

[1293] [ 1] Endo Y., Tsunjgi K., Takeda Y., Ogasawara T, Igaiashi K. Eur. J. Biochem. 171:45-50(1 988). [ 2] May M. 
J., Hartley M.R., Roberts LM., Krieg PA.. Osbom RW.. Lord J.M. EMBO J. 8:301 -308(1 989).{ 3] Funatsu G.. Islam 
M.R.. Minami Y. Sung-Sil K.. Kimura M. Biochimie 73:1157-11 61(1 991 ).[ 4) Strockbine N.A.. Jackson M.P, Sung L 
M., Holmes R.K., O'Brien AD. J. Bacteriol. 170:1 116-1122(1988).[ 5] Cabenwood S.B., Auclair R. Donohue-Rolfe A., 
Keusch G.T. Mekalanos J. J. Proc. Natl. Acad Sci. U.S.A. 84:4364-4368(1 987).[ 6J Jackson M.R, Neill R.J., O'Brien 
A.D.. Holmes R.K., Newtand J.W. FEMS Microbiol Lett 44:1 09-11 4(1 987). [ 7] Bart^ieri L, Battelli M.G.. Stirpe F. Sky- 
chim. Biophys. Acta 11 54:237-282(1 993).[ 8] Hovde C.J.. CaWenwood S.B., Mekalanos J.J.. Collier RJ. Proc. Natl. 
Acad. Sci. U.S.A 85:2568-2572(1 988). { 9] Monzingo AR. Collins E.J., Ernst S.R.. In/in J.D., Robertus J.D. J. Mol. 
Bk>l. 233:705-715(1993). 

[1294] 479. Bacterial RNA polymerase, alpha chain (RNA pol A bac) 

Members of this family include alpha subunit from eubacteria and alpha subunits from chloroplasts. The alpha subunit 
of RNA polymerase consists of two independently folded domains, referred to as amino-terminal and carboxyi terminal 
domains. The amino terminal domain is involved in the Interaction with the other subunits of the RNA polymerase. The 
carboxyl-terminal donr^in interacts with the DNA and activators. The amino acki sequence of the alpha subunit is 
conserved in prokaryotic and chloroplast RNA polymerases. There are three regions of particularly strong consen^ation» 
two in the amino-terminal and one in the carboxyl-Comment terminal 13]. 

11 ] Zhang G. Darst SA; Science 1 998;281 :262-266. [2] Jeon YH, Negishi T, Shirakawa M, YamazakI T. Fujita N, Ishihama 
A, Kyogoku Y; Science 1995;270:1495-1497. 13J Ebright RH, Busby S; Curr Opin Genet Dev 1995;5:197-203. [4] Mu- 
rakami K. Kimura M. Owens JT, Meares CR Ishihama A; Proc Natl Acad Sd USA 1997;94:1709-1714. 
[1295] 480. RNA polymerase beta subunit (RNA pol B) 

RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase 
compared to three in eukaryotes (not including mitochondrial and chk^roplast polymerases). Each RNA polymerase 
complex contains two related members of this family, in each case they are the two largest subunits. [1J Falkenburg 
D. Dwomfczak B, Faust DM, Bautz EK; J Mol Biol 1987;195:929-937. 
[1298] 481 . RNA polymerases H / 23 Kd subunits signature 

In eukaryotes, there are three different forms of DNA-dependent RNA polymerases (EC 2.7.7.6 ) transcribing different 
sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. In archaebac- 
teria, there is generally a single form of RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 
polypeptkies. Archaebacterial subunit H (gene rpoH) [1.2J is a small protein of about 8.5 tolO Kd, it is evolutionary 
related to the C-terminat part of a 23 Kd component shared by all three forms of eukaryotic RNA polymerases (gene 
RPB5 in yeast and POLR2E in mammals).As a signature pattern a consented region was selected which is located at 
theN-terminal extremity of subunit H; this region contains two histidines that could play a role in the binding of a metal bn. 
[1297] Consensus pattern: H-INEIJ-[LlVM]'V-P-x-H-x(2)-(UVMJ-x(2)-|DE] 

[ 1) Klenk H.-P.. Palm R, Lottspeich R, Zillig W. Proc. Natl. Acad. Sci. U.S.A 89:407-410(1992). 

[ 2] Thiru A., Hodach M., Eloranta J.J.. Kostourou V, Weinzieri R.O.. Matthews S.: J. Mol. Bk>l. 287:753-760(1 999^. 

[1 298] 482. RNA polymerases K / 1 4 to 1 8 Kd subunits signature 

In eukaryotes. there are three different forms of DNA-dependent RNApolymerases (EC 2.7.7.6 ) transcribing different 
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1 IRpt. 2 IRpl 3 IRpt. 4 IRpt. 5 IRpt 6 IRpt. 7 I C-terminal I + 



+ + In Drosophila two signature patterns for RCC1 were developed. The first is found in the N4erminal part 

of the second repeat; this is the most conserved part of RCC1. The second is derived from consen/ed positions in the 

C-terminal part of each repeat and detects up to five copies of the repeated domain. The RCCI-type of repeat is also 

found in the X-iinked retinitis pigmentosa GTPase regulator [3]. 

[1282] Consensus pattern: G-x-N-D-x(2)-(AV]-L-G-R-x-T- 

Consensus pattern: lLIVMFAHSTAGG](2)-G-x(2)*H-[STAGLIHUVMFAJ-x-[LIVMJ. 

( IJ Dasso M. Trends Biochem. Sci. 18:96-101(1993). 

[ 2] Boguski M.S.. McCormfck R Nature 366:643-654(1993). 

( 3) Roepman R., Van Duijnhoven G.. Rosenberg T. Plnckers A^J.LG.. Sleeker- Wagemakers LM., Bergen AA 
B., Post J., Beck A.. Reinhardt R.. Ropers H.-H., Cremers R. Berger W. Hum. f^L Genet. 5:1035-1041(1996). 

[1 283] 474. RNA 3'-lerminal phosphate cyclase signature (RCT) 

RNA 3*-terminal phosphate cyclase (EC 6.5.1.4 ) [1 ,2] catalyzes the conversfon of 3'-phosphate to a e,3'-cyclic phos- 
phodiester at the end of RNA. The bfologrcal role of this enzyme is unknown but it is likely to function iri some aspects 
of cellular RNA processing. The reactwn catalyzed by the enzyme occurs in three steps: 1) adenylatbn of the enzyme 
by ATP; 2) the enzyme acts on RNA-3'terminal phosphate to produce RNA-31enminal diphosphate adenylate; 3) Re- 
lease of AMP and cyclisation by a non catalytic nucleophilic attack by the adjacent 2'hydroxyl on the phosphorus in 
the diester linkage. This enzyme, which has been characterized in human (where there seems to be at least three 
isozymes) and Escherichia coli (gene rtCA). seems to be taxonomically widespread. It is found in insects, plants, fungi 
(gene RTC1 inyeast) and in archeabacteria. RNA cyclase is a protein of from 36 to 42 Kd. The best conserved regbn 
whfch is used as a signature pattern, is a glycine-rich stretch of rescues located in the central part of the sequence 
and which is reminiscent of varbus ATP. GTPor AMP giycine-rich bops. In this context, the consented Arg (His in the 
E.coli enzyme) coukJ be the AMP-binding residue. 
[1284] Consensus pattern: [RHJ-G-x(2)-P-x-G(3)-x-[LIVl- 

[ 1| Genschik R. Billy E.. Swianiewicz M., Filipowicz W. EMBO J. 16:2955-2967(1997). 
[ 2J Filipowicz W., Vincente O. Meth. Enzymol. 181:499-510(1990). 

[1285] 475. REV protein (anti-repression trans-activator protein) 

[1 286] 476. Prokaryotic-type class I peptkle chain release factors signature (RF-1 ) 

Peptide chain release factors (RFs) are required for the temnination of protein biosynthesis [1]. At present two classes 
of RFs can be distnnguished. Class I RFs bind to ribosomes that have encountered a stop codon at their decoding site 
and induce release of the nascent polypeptide. Class II RFs are GTP^inding proteins that interact with class I RFs 
and enhance class I RF activity In prokaryotes there are two class I RFs that act in a codon specific manner{2]: RF-1 
(gene prf A) mediates UAA and U AG-dependent termination while RF-2(gene prfB) mediates UAA and UGA-dependent 
termination. RF-1 and RF-2 are structurally and evolutionary related proteins which have been shown [3J to make up 
a family that also contains the folfowing proteins: - Fungal MRF1, a mitochondrial RF (m-RF) which recognizes the 
UAA and UAG codons. - Escherichia coli RF-H. a protein of unknown function. - Escherichia coli hypothetical protein 
yaeJ and a close Pseudomonas putrcia homolog. A highly conserved region located in the central part of the 40 to 45 
Kd RF-1/2 and m-RF and In the N-tenninal of the 15 to 16Kd RF-H and yaeJ is used as a signature pattern 
[1287] Consensus pattern: [ARJ-[STA)-x-G-x-G-G-Q-[HNGCS^V-N-x(3)-[ST]-A-IIV] 

Note that prokaiyotb-type class I RFs display no significant sequence similarity to prokaryotic-type class II which belong 
to the family of GTP-binding elongation factors nor to eukaryotb class I or class II RFs. 

[ 1) Tate W.P. Poole E.S.. MannerIng S.M. Prog. Nucleic Acbs. Res. Mol. Biol. 52:293-335(1996). 
[ 2J Craigen W.J., Lee C.C.. Caskey C.T. Mol. Microbiol. 4:861-865(1990). 
1 3J Pel H.J.. Rep M., Grivell LA. Nucleb Acids Res. 20:4423-4428(1992). 

[1 288] 477. RI01/2K632. 3/MJ0444 family signature 

The foltowing uncharacterized proteins are evolutionary related (1]: - Yeast protein RI01. - Caenorhabditis elegans 
hypothetical protein ZK632.3. - Methanococcus jannaschii hypothetical protein MJ0444. - Thermoplasma acklophilum 
hypothetrcal protein if rpoA2 3'region.The eukaryotic members of this family are proteins of about 55 to 60 Kd while 
the archebacterial ones are half that size. The central part of these proteins is highly consen/ed. The best conseived 
region is used as a signature pattern. 

[1289] Consensus pattern: (LIVM]-V-H-[GA]-D-L-S-E-[FYJ-N-x4LIVM) 
[1290] { IJ Bairoch A. Unpublished observations (1997). 
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1 1) Riven A. J. Biochem. J. 291:1-10(1993). 

1 2] Riven A.J. Arch. Biochem. Biophys. 266:1-8(1989). 

( 3) Goldberg A.L. Rock K.L Nature 357:375-379(1992). 

1 4] Wilk S. Enzyme Protein 47:187-188(1993). 

( 51 Hill W.. Wolf D.H. Trends Biochem. Sol. 21:96-102(1996). 

16J Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

(1277] 471. (pyr redox) Pyridine nucleotide-disulphide wkJoreductases class-l active site 
The pyridine nucleotkieKiisulphide oxkJoreductases are FAD flavoprotoins which contains a pair of redox-active 
cysteines involved m the transfer of reducing equivalents from the FAD cotactor to the substrate On the basis of 
sequence and structural similarities (IJ these enzymes can be classified into two categories. The firsi category groups 

rSS^lI rVr ?r^^.rT'' ' '^"^ (EC16A2) (GR). - Higher eukaryoteslS^toxln 

reductase (EC VSAS). - Trypanothione reductase (EC 1.6.4.8V - UpoamkJe dehydrogenase (EC 1 8 1 4) the E3 

.^t^*"®"?- ^'P^-*^'*^" dehydrogenase complexes. - Mercurc reductase (EC 1.16.l.i VThe s^^e around 
M^-^T ""^^ '"^^ *" disulfide bond is conserved and can belies a signmuro pattern 

n278I Cons^us pattern: G-G-x-C-[UVAhx(2)-G-C-{UVM]-P (The two Cs form the active site disulfide^l In 
p<^rtions6 and 7 of the pattern all known sequences have Asn-(\fel/lle) with the exception of GR from plant chkyootasts 
and from cyanobacteria which have lle-Arg f7J. crooropiasw 

l^l^T^^ *^"^* Pahler A.. Williams C.H. Jr. Model R Nature 352:172-174 

[ 2) Rtee D.W.. Schuiz G.E.. Guest J.R. J. MoL Biol. 174:483-496(1984). 
[ 3] Brown N.L Trends Biochem. Sci. 10:400-402(1985). 

[ 4J Carothers D.J.. Pons G., Patel IWI.S. Arch. Bkxhem. Bbphys. 268:409-425(1989). 

( 5) Walsh C.T.. Bradl^ M., tsladeau K. Trends Biochem. ScL 16:305-309(1991). 

[ 6J Gasdaska RY. Gasdaska J.R., Cochran S.. Powis G. FEBS Lett. 373:5-9(1995). 

[ 7] Creissen G.. Edwards E.A.. Enard C. Welibum A.. Mullineaux P Plant J. 2:129-131(1991). 

T^^^r^ f^- 1^^"''°''^' "^^^ ' ' ' ^""^ PyrWoxal-phosphate attachment site (pyrkJoxal deC) 
Three different enzymes - all pyridoxaWependent decarboxylases - seem to share regfons oTsequence similarity 
^^3 4J.espec«lly,n the vicinity Of tt,e lysine residue whk^senresastheate^^ 

(PLP) group. These enzymes are: - Glutamate decarboxylase (EC 4.1.1.15) (GAD). Catalyzes the deti^STt^oLl^ 

glutartate «to the neurotransmitter GABA (4-aminobulanoate) - Hi5gi;Jecarboxylase (EC 4 

^zes the decarboxylatwn of histidine to histamine. There are two completely unrelated types SFlS? tt^ow that ^e 

residue (found jn Gram-posrtive bacteria). - Aromatic-L-amino-acid decarboxylase (EC 4.1.1.28^ (DDC) als^S 
asL-dopadeca*oxylaseortryptophandecarboxylase.DD^ 

Twnrf kJT ^ dihydroxyphenylabnine (LKtopa). - Tyrosine decarJ^xytese (EC 4 1 Tas) 

(TyrDC) which converts tyrosine mto tyramine. a precursor of isoquinoline alkaiotis and various amides These iJi^s 
are conecthrely known as group II decarboxylases (3.4). •'"ousamroes. I hese enzymes 

m^l i^^T ^T""- !^'-'^'^'^^(5)-K-IUVMFYWGJ(2)-x(3)-(UVMFYWhx-(CA)-x(2HU\fl^PrWQ)-x(2)- 
[RK] [K IS the pyndoxal-P attachment site] ' \ r\ >"r i wuj 

1 1) Jackson PR. J. MoL Evol. 31:325-329(1990) 

1 3) Sandmeter E., Hale Tl., Christen P. Eur. J. Biochem. 221:997-1002(1994) 

1 4J Ishii S.. MizugKhi H.. Nishino J., Hayashi H.. Kagamiyama H. J. Biochem. 120:369-376(1996). 

[1281J 473. Regulator of chromosome condensation (RCC1) signatures (RCC1) 

The regulator of chromosome condensatwn (RCC1) [1 J is a eukaryote protein which binds to chromatin and interacts 
wrth ran. a nuclear GTP-binding protein, to promote the loss of bound GDP and me uptake o^,e^QlP^^l 
as a guanne-nucleotide dissociation stimulator (GDS)I2J. The interaction of RCC1 with ran probably plays ^n 
role .0 the regulation of gene expression. RCC1 . known as PRP20 or SRM1 in yeast, p Jin fissIL y^fan^'^lt 

.hr^E!^' ? .'" ' °' ^ °' ^ 50 to 60 amino acL. As shown in 
hefo towing schematic representatbn. the repeats make up the major part of the length of the protein OuSdr^he 
repeat reg»n. there is just a smaO N-tem,inal domain of about 40 to 50 reskJues and, in the Dros^hila prrtet oni! a 
C-lermmal domain of about 130 reskJues.+--+ + ^ ^ ^ "fosopmia proce^ onh^^ a 

■r- + + ^ _^ IN-l.lRpt. 
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which belong to different phyla (ranging from fungi to mammals) is low, but the N-tenminal region is relatively well 
conserved. That region is thought to be involved inthe binding to actin. The signature pattern for profilin is based on 
consented residues at the N-terminal extremity .A protein stmcturally similar to profilin is present in the genome of 
variola and vaccinia viruses (gene A42R). 

[1268] Consensus pattern: <x{0,1)-[STAl-x(0,1)-W-lDENQH]-x-lYI]-x-[DEQl 

[ 1) Haarer B.K., Brown S.S. Cell MotiL Cytoskeleton 17:71-74(1990). 
[ 2) Sohn an., Goldschmidl-Clermont P. BioEssays 16:465-472(1994). 

[1269] 468. Protamine P1 signature 

Protamines are small, highly basic proteins, that substitute for histones in spemr) chromatin during the haploid phase 
of spermatogenesis. They pack sperm DNA Into a highly condensed, stable and inactive complex. There are two 
different types of mammalian protamine, called PI and P2. PI has been found in all species studied, while P2 is 
sometimes absent. There seems to be a single type of avian protamine whose sequence is closely related to that of 
mammalian PI [1 J. As a signature for this family of proteins, a consented region was selected at the N-4erminal extremity 
of the sequence. 

[1270] Consensus pattern: [AV]-R-[NFY]-R-x(2,3)-[ST]-x-S-x-S- 

[1271] [ 1] Oliva R.. Goren R., Dixon G.H. J. Bk)l. Chem. 264:17627-17630(1989). 

[1272] 469. Sperm histone P2 (protamine P2) 

This protein also known as protamine P2 can substitute for histones in the chromatin of sperm. The alignment contains 
both the sequence of the mature P2 protein and its propeptide. 
[1273] 470. Proteasome A-type subunrts signature 

The proteasome (or macropain) (EC 3.4.99.46 ) [1 to 5. El] is an eukaryotic and archaebacterial multk:atalytic proteinase 
complex that seems to be involved inan ATP/ubiqurtin-dependent nonlysosomal proteolytc pathway. In eukaryotes the 
proteasome is composed of about 28 distinct subunits whfch fomi a highly ordered ring-shaped stmcture (20S ring) of 
about 700 Kd. Most proteasome subunits can be classified, on the basis on sequence similarities into two groups, A 
and B. Subunits that belong to the A-type group are proteins of from 210 to 290 amino acids that share a number of 
consented sequence regions. Subunits that are known to belong to this family are listed below. - Vertebrate subunits 
C2 (nu), C3, C8. C9. iota and zeta. - Drosophila PROS-25. PROS-28.1, PROS-29 and PROS-35. - Yeast CI (PRS1) 
C5 (PRS3), C7-alpha (YB) (PRS2), Y7. Y1 3, PRES. PRE6 and PUP2. - ArabkJopsis thaliana subunits alpha and PSM3o'. 
- Thermoplasma acidophilum alpha-subunit. In this archaebacteria the proteasome is composed of only two different 
subunits.As a signature patlem for proteasome A-type subunits the best consented regk>n was selected, which is 
located in the N-terminal part of these proteins. 

[1274] Consensus pattern: [FY]-x(4)-[STNV)-x-[FYW]-S.P-x-G-[RKH).x(2)-Q-[LIVM]-[DE]- Y-(SAD]-x(2)-lSA61-. 
These proteins belong to family T1 in the classification of peptidases [6.E2]. 

1 1) RIvett A.J. Biochem. J. 291:1-10(1993). 

[ 21 Rivett A. J. Arch. Biochem. Btophys. 268:1-8(1989). 

[ 3] GofcJberg A.L., Rock K.L Nature 357:375-379(1992). 

[ 4] Wilk S. Enzyme Protein 47:187-188(1993). 

[ 5) Hilt W., Wolf D.H. Trends Biochem. Sci. 21:96-102(1996). 

[ 6] Rawlings N.D., Barrett A.J. Melh. Enzymol. 244:19-61(1994). 

[1275] Proteasome B-type subunits signature 

The proteasome (or macropain) (EC 3.4.99.46 ) (1 to 5. El] is an eukaryotic and archaebacterial multicatalytic proteinase 
complex that seems to be involved in an ATP/ublquitin-dependent nonlysosomal proteolytic pathway In eukaryotes 
the proteasome is composed of about 28 distinct subunits which form a highly ordered ring-shaped structure (208 ring) 
of about 700 Kd. Most proteasome subunits can be classified, on the basis on sequence similarities into two groups. 
A and B. Subunits that belong to the B-lype group are proteins of from 1 90 to 290 amino ackJs that share a number of 
consen/ed sequence regions. Subunits that are known to belong to this family are listed below. - Vertebrate subunits 
C5, beta, delta, epsiton. theta (C10-II), LMP2/RING12. C13 (LMP7/RING10), C7-I and MECL-1. - Yeast PREl PRE2 
(PRG1). PRE3, PRE4. PRS3. PUP1 and PUPS. - Drosophila L(3)73Al. - Fission yeast ptsl. - Thermoplasma ackJo- 
philum bela-subunit. In this archaebacteria the proteasome is composed of only two different subunits. As a signature 
pattern for proteasome B-type subunits the best conserved region was selected, which is located in the N-lerminal part 
of these proteins. 

[127C] Consensus pattern: ILIVMA]-[GSAl-|LIVMF]-x-[FYLVGAC]-x(2)-[GSACFY]-(LIVMSTAC](3)-(GAC]- 
[GSTACV]-[DES]-x(15)-IRK]-x(12.13)-G-x(2)-IGSTAl-D-. These proteins belong to family T1 in the classification of 
peptidases [6,E2]. 
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ysqualene-cycloartenol cyclase), a plan! enzyme that catalyzes the cycOzation of (S)-2.3- epoxysqualene to cyctoar- 

tenol. - Hopene synthase (EC 5.4.99.-) (squaJene-hopene cyclase), a bacterial enzyme that catalyzes the cyclization 

ol squalene «ito hopene. a key step in hopanold (tr fterpenoid) metabolism.These enzymes are evolutionary related 11 1 

proteins oJ about 70 to 85 Kd. As a signature pattern, a highly conserved region was selected which is rich in aromatic 

residues and which is located in the C-terminal section. 

[1261] Consensus pattern: |DE]-G-S-W-x-G-x-W-{GAHUVM]-x-(FYJ-x-Y4GA] 

[1262J 1 1) Corey E.J.. Matsuda S.P.T.. Bartel a Proc NatL Acad. Sci. U.S.A. 90:11628-11632(1993) 

[1263] 465. Prion proteni si^iatures 

Prion protein (PrP) (1.2,3) is a small glycoprotein found In high quantity in the brains of humans or animals Infected 
with a number of degenerative neurological diseases such as Kuru. Creutzf eldt^Jacob disease (CJD), scrapie or bovine 
spongifomf, encephalopathy (BSE). PrP is encoded in the host genome and expressed both In normal and infected 
celfe. It has a tendency to aggregate yielding polymers called rods. Structurally. PrP b a protein consisting of a signal 
p^tide. followed by an N-teminal domain that contains tandem repeats of a short motif (PHGGGWGQin mammals 
PHNPGY n chicken), itself fotkmed by a highly consented domain lly comes a C-termlnal hydrophobic domain post- 
translationally removed when PrP is attachedto the extracellular side of the cell membrane by a GPI-anchor The 

structureof PrP Is shown in the f ollovnng schematk; representation- h — i y.*—*- ^ ' 

-.ISigl Tandem repeats I C C Sll +~h ■ t |_|h ^ , , | Qp,^ consenred 

cysteine involved in a disulfide bond.-: position of the patterns. As signature pattern for PrR a perfectly conserved 
alanine- and g^cinjwich region of 16 residues was selected as weN as a region centered on the second cysteine 
involved in the disulfkle bond. 

[1264] Consensus pattem: A-G-A-A-A-A-G-A-V-V-G-G-L-G-G-Y- 

Consensus pattem: E-x-[EDhx-K-[UVM](2)-x-[KRHUVMJ(2)-x-(QEJ-M-C-x(2)- Q-Y {C is imrolved in a disulfkle bond] 
( 1] Stahl N., Prusiner S.B. FASEB J. 5:2799-2807(1991). 

1 2] Bnjnori M., Chiara Silvestrini M.. Pocchiari M. Trends Biochem. Sci. 13 309-313(1988) 
[ 3] Prusiner S.B. Annu. Rev. Mfcrobfol 43:345-374(1 989). 

[1265] 466. Cyclophilln-type peptWyl-prolyl cis-trans isomerase signature and profile (pro isomeiase) 
frcLT l^l^ high-affinity binding protein in vertebrates for the immunosuppressive dmg cyctosporin A 

(CSA). It exhibits a peptidyl- prolyl cis^rans isomeiase activity (EC 5.2.1.8) (PPIaso or rotamase). PPIase is an enzyme 
hat accelerates protein folding by catalyzing the cis-trensisomerizatk)n of proline imklk: peptide bonds in oligopeptWes 
[2J. It IS probable that CSA mediates some of its effects via an inhibitory action on PPIase. Cyclophilin is a cytosoUc 
prrtem which betongs toa famiV [3.4.5]that also includes the foltowing Isozymes: - Cyclophilin B (or S-cyctopWIin) a 
PPIase which IS retained in an endoplasmic retculum compartment. - Cyctophilin C. a cytoplasmic PPIase. - Mitochon- 
dnal matr« cyctophjlin (cyp3). - A PPIase which seems specific for the foldingof ihodopsin and is an integral membrane 
protein anchored by a C-terminal transmembrane regkxi. This piotein was first characterized in Drosophila (qene 

periplasmlc PPIase (gene pplA). - Bacterial cytosolte PPiase (gene ppiB). - Natural-killer cell cyclo- 
phihn-rela ted protejn. Th« large piotein (about 1 60 Kd) is a component of a putative tumor-recognilton complex Involved 
inthefunctionof NKcelb. It contains a cyctephilin-type PPiase domain. - Mammalian nucleoporin Nup358|61 anuclear 
c^""^ of 358 Kd that contains a C-tem,inal cyctophilin-type PPiase domain. - Yeast hy^etfcal protein 
S^J^Th'/r"'" y"^";.|:yP°'^^''^' P^°»«*" SPAC21E11.05C. - Caenorhabditis elegans hjS«tfcal protein 
T27D1 .1 The sequences of the different forms of cyclophilin-type PPIases are well consenred. As a signature pattem 
a conserved regton was selected in the central part of these enzymes 

[1266] Consensus pattem: |FYhx(2HSTCNLVhx-F-H-[RH]-[LIVMNHLIVM)-x(2)-F- (LIVM)-x-CHAGl-G- FKBPs a 
rrtSdtSTcyX'S'n.''" "^"^^^^"^ ^ ^P'---- ^ «^eir sequence is not at aB 

1 1] Stamnes MA, Rutherlord S.L, Zuker C.S. Trends Cell BioL 2:272-276(1992). 

[ 2] Fischer G.. Schmid F.X Biochemistry 29:2205-2212(1990). 

1 3) Trandinh C.C.. Pao G.M.. Saler M.H. Jr. FASEB J. 6:3410-3420(1992) 

1 41 Galat A. Eur. J. Biochera 216:689-707(1993), 

( 5] Hacker J., Fischer G. Mot. Microbiol. 10:445456(1993). 

[ 61 Wu J.. Matunis M.J.. Kraemer D.. Blobel G., Coulavas E. J. Biol. Chem. 270:14209-1421 3f 19951 
[1267] 467. Profilin signature 

Profilin 11.2J is a small eukaryotc protein that binds to monomerk: actin(G-actln) in a 1:1 raito thus preventing the 
polymerizatron of act», into filaments (F-actin). It can also. In certain circumstance promotes actin ^lymerizatwn 
Prafilin also binds to polyphosphoinositWes such as PIP2.0verall sequence similarity among profilin fS^^^feTs 
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[ 2] Svendsen L. Boisen S., Hejgaard J. Carlsberg Res. Commua 47:45-53(1982). 

[ 3] Nozawa H., Yamagata H., Aizono Y.» Yoshikawa M.. iwasaki T J. Biochem. 106:1003-1008(1989). 

[ 4) Cleveland TE, Thomburg R.W., Ryan C.A. Plant MoL Biol. 8:199-207(1987). 

( 5) Lee J.S., Brown W.E.. Graham J.S., Pearce G.. F<»c E.A., Dreher T.W., Ahem K.G.. Pearson G.D.. Ryan C.A. 
Proc. Natl. Acad. Scl U.S.A. 83:7277-7281(1986). 

1 6) Seemuller U.. Eulitz M., Fritz H., Strobl A. Hoppe-Seyler*s Z Physiol. Chem. 361:1841-1846(1980). 
[ 7) Zeng F.-Y, Qian R.-Q., Wang Y FEBS Lett 234:35-38(1988). 

[1258] 463. (pp binding) Phosphopantetheine attachnDent site 

Phosphopantetheine (or pantetheine 4*phosphate) Is the prosthetic group of acyl carrier proteins (ACP) in some mul- 
tienzyme complexes where it serves as a 'swinging arm' for the attachment of activated fatty acid and aminonacid 
groups [1]. Phosphopantetheine is attached to a serine residue in these proteins [2]. ACP proteins or domains have 
been found in various enzyme systems which are listed bebw (references are only provided for recently detemiined 
sequences). - Fatty acid synthetase (FAS), which catalyzes the formation of long-chain fatty acids from acetyl-CoA. 
malonyl-CoA and NADPH. Bacterial and plant chloroplast FAS are composed of eight separate subunits which corre- 
spond to the different enzymatic activities; ACP is one of these polypeptides. Fungal FAS consists of two multifunctional 
proteins, FAS1 and FAS2; the ACP domain is located in the N-terminal section of FAS2. Vertebrate FAS consists of a 
single multifunctional enzyme; the ACP domain is located between the beta-ketoacyl reductase domain and the C- 
terminal thioesterase domain (3). - Polyketide antibiotics synthase enzyme systems. Polyketides are secondary me- 
tabolites produced from simple fatty ackJs. by microorganisms and plants. ACP is one of the polypeptkJic components 
involved In the bbsynthesis of Streplomyces polyketWe antibiotics actinorhodin, curamycin, granatacin, monensin. 
oxytetracycline and tetracenomycin C. - Bacillus subtllis putative polyketide synthases pksK. pksL and pksM which 
respectively contain three, five and one ACP domains. - The multifunctional 6-methysalicylic ackJ synthase (MSAS) 
from Penicillium patulum. This is a multifunctional enzyme involved in the biosynthesis of a polyketide antibfotic and 
whtth contains an ACP domain in the C-terminal extremity. - Multifunctk>nal mycocerosic acki synthase (gene mas) 
from Mycobacterium bovis. - Grambidin S synthetase I (gene grsA) from Bacillus brevis. This enzyme catalyzes the 
first step in the bbsynthesis of the cyclic antibiotb gramcidin S. - Tyrocidine synthetase I (gene tycA) from Bacillus 
brevis. The reaction carried out by tycA is identical to that catalyzed by grsA - Gramicidin S synthetase II (gene grsB) 
from Bacillus brevis. This enzyme is a multrfunctbnal protein that activates and polymerizes proline, valine, omithine 
and leucine. GrsB contains four ACP domains. - Erythronolbe synthase proteins 1. 2 and 3 from Saccharopolyspora 
erythraea which is involved in the biosynthesis of the polyketbe antibbtic erylhromicin. Each of these proteins contain 
two ACP domains. - Conidial green pigment synthase from Aspergillus nidulans. - ACV synthetase from various fungi. 
This enzyme catalyzes the first step in the bbsynthesis of penicillin and cephalosporin. It contains three ACP domains. 
- Enterobactin synthetase component F (gene entF) from Escherichia coli. This enzyme is involved in the ATP-depend- 
ent activation of serine during enterobactin (enterochelin) bbsynthesis. - Cyclic peptide antibiotic surfactin synthase 
subunits 1 , 2 and 3 from Bacillus subtilis. Subunits 1 and 2 contains three related domains while subunit 3 only contains 
a single domain. - HC-toxin synthetase (gene HTS1 ) from Cochlbbolus carbonum. This enzyme synthesizes HC-loxIn, 
a cyclb tetrapeptide. HTS1 contains four ACP domains. - Fungal mitochondrial ACP (9], which is part of the respiratory 
chain NADH dehydrogenase (complex I). - Rhizobium nodulatbn protein nodF. which probably acts as an ACP in the 
synthesis of the nodulation Nod factor fatty acyl chain.The sequence around the phosphopantetheine attachment site 
is consen/ed in alt these proteins and can be used as a signature pattern. A profile was also developed that spans the 
complete ACP-like domain. 

[1259] Consensus pattern: [DEQGSTALMKRH]-[LIVMFYSTACHGNQ1-[LIVMFYAGHDNEKHS]-S- ILIVMST|-{PC- 
FYKSTAGCPQLI VMFHUVMATNHDENQGTAKRHLMl- [LI VMWSTAHLI VGSTACR)-x(2)>lLI VMFAJ (S is the panteth- 
eine attachment site] 

[ 1) Concise Encyclopedia Biochemistry, Second Edition. Walter de Gruyter. Berlin New-York (1988). 
[ 2) Pugh E L, Wakil S.J. J. Biol. Chem. 240:4727-4733(1965). 

( 3) Witkowski A.. Rangan V.S., Randhawa Z.I., Amy CM., Smith S. Eur. J. Biochem. 198:571-579(1991). 

(6) Scotti C, Piatti M.. Cuzzoni A.. Perani P, Tognoni A.. Grandi G.. Galizzi A.. Albertini A.M. Gene 130 65-71 
(1993). 

I 9) Sackmann U.. Zensen R.. Rohlen D., Jahnke U., Weiss H. Eur. J. Bbchem. 200:463-469(1991). 
[1260] 464. (Prenyltrans) Terpene synthases signature 

The following enzymes catalyze mechanistically related reactbns which involvethe highly complex cyclb rearrange- 
ment of squalene or its 2,3 oxide: - Lanosterol synthase (EC 5.4.99.7 i (oxidosqualene -lanosterol cyclase), which 
catalyzes the cyclization of (S)-2,3-epoxysqualene to lanosterol, the initial precursor of cholesterol, steroid homiones 
and vitamin D in vertebrates and of ergosterol in fungi (gene ERG7). - Cycbartenol synthase (EC 5.4.99.8 ) (2.3-epox- 
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rSLZ^'SSrSrSe^"^ '"^ ^'^ "^Vstelne i™.^ed in 

U^L '"•.^-''<5)-R-x(2HFYhx(2)<: (The three C-s are involved in disulfide bondsj The proteins 

from the gamna-thwrnn fairaly are not related to the above proteins and are descrit«d in a separate sectloa 

\ I! - ^"^^ ' ^^'""^ " °- ^'^^ ^ ^ Biochem. Biophys. 238:l8-29(ig85) 

K E^SSTy-rssSo^sr'^ ^'"^ "^^"'^ " • • ^'"'^^ v.. Ape. 

1 3J Bohlmann H.. Apel K. Mot. Gen. Genet. 207:446-454(1 987) 

( 41 Teeter M.M.. Mazer J.A. L'ltalien J.J. Biochemistry 20:5437-5443(1981). 

[1255] 461. Polyprenyl synthetases signatures 

A variety ol isoprenoid compounds are synthesized by various organisms. For example in eukaryoles the isoorenoid 
b osynthet« pathway « responsible for the synthesis of a variety of end products including cJSS S^liZ ub2 
umone or coenzyme Q. In bacteria this pathway leads to the synthesis of isopentenyl tRNr^oSnS SLml «^ 
sugar «n,er lipids. Among the enzymes that participate in that pathway, a^ a Z^S^^r^S^S^^s^l^ 
zymes wh»h catalyze a r4K»ndensation between 5 carbon isoprene units Currently O^^S^J^^H 
enzymes is known: - Euka,>otk: famesyl pyrophosphate synthet4 (FPP Snme^i (K 2 TiTec gT^^^ 

and then wrth the resultant geranyl pyrophosphate to fom> famesyl pyrophosphate FPP syrthTsTfe^^Jio^ 

T^Z"- ' 5;°1,'^J" synthetase (geTis^). - ^rcS^oteXrenyl d^Se 

synthase (gene ispB). - Prokaryotic heptaprenvl diohosDhate svnthaQo /Pr 9 ^ i c "-"Pf eny i oipnospnate 

rophosphale synthetase (G^P synthLse)"^ fsT i / EcTs^^^^^ Euten««K: geranylgeranyl py- 

addi^on Of the three molecules of fpP onto 1^^^!%^^^^^^,:^^^^ 

.s a chloroplast en^me involved in the biosynthesis of terpenoids;ln f ungi^u^ JtiZ^u^o^^^ 

oarad^r P?L!!^T^ . ^ ^ ^"^^ 'he cyanelle genome of Cyanophora 

^^fS*^ . ^^^1!"^'^^' Py^°P^^P»«t« synthetase, which is involved in the bk«ithesis d^caSe O 
and •J>^catalyzes the fonnation of all t«ns- polyprenyl pyrophosphates generally rangTh ^91^ <5ZZ? 
and 10 Boprene unrts depending on the species. HP synthetase is a mitochondrial rnembrane^tet^ 2^ J 
has been shown (1 to 5) that all the above enzymes share some regions of sequenc^te^t^S ^ese^^L 

signature patterns were devetopedfor both regions. Possible addHonal members of this famlV of orcJeins are S,Shf« 

[125q Consensus pattern: |UVM](2)-x-D-D-x(2,4)-D-x(4)-R-R-[GHl- 
Consensus pattern: (LIVMFYl-G-x(2)-[FYLJ-Q-{LIVMl-x-D-D-(LIVMFYJ-xHDN6J 

( 1] Ashby M.N.. Edvrards PA. J. Biol. Chem. 265:13157-13164(1990) 

! S S*".®. ^'^ ^'^^'"^ "oriuchi K.. NishinoT. J. Biochem. 108:995-1000(1990) 
2 S??^ o ' J. Biol. Chem. 266:5854-Si 991 ) 

2 M Tc^' ®- • " ■ Schantz R., Camara B. Pla^ S(1992) 

2 ^^l- ■ ""^""^^ ^ ° Natl. Acad. ScL U.S.A. 89:6761 -6764(Xr 

1 6J Bairoch A. Unpublished observations (1993). a oroi-orMpsa^). 

[1257] 462. Potato inhibitor I family signature 

[ 1] Svendsen I., Hejgaaid J., Chavan J.K. Cartsberg Res. Commun. 49:493-502(1984). 
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[1246J Consensus pattern: (DN]-[LIV]-Y-x(3)-Y-Y-R (The second Y is the aulophosphorylalion site] 
[12471 1 1) Vaiden Y, Ullrich A. Annu. Rev. Biochem. 57:443-478(1988). 
[1248] Receptor tyrosine kinase class III signature 

A number of growth factors stimulate mitogenesis by interacting with a family of cell surface receptors which possess 
an intrinsic, ligand-sensitive, protein tyrosine kinase activity (1]. These receptor tyrosine kinases (RrK)all share the 
same topology: an extracellular ligand-blnding domain, a single transmembrane region and a cytoplasmic kinase do- 
main. However they can be classified into at least five groups. The class III RTK's are characterized by the presence 
of five to seven immunogk)bulin-like domains [2] in their extracellular section. Their kinase domain differs from that of 
other RTICs by the insertion of a stretch of 70 to 100 hydrophilk; residues in the middle ofthis domain. The receptors 
currently known to belong to class III are: • Platelet-derived growth factor receptor (PDGF-R). PDGF-R exists as a 
homo- or heterodimer of two related chains: alpha and beta (3). - Macrophage colony stimulating factor receptor (CSF- 
1-R) (also known as the fms oncogene). - Stem cell factor (mast cell growth factor) receptor (also known as the kit 
oncogene). - Vascular endothelial growth factor (VEGF) receptors Flt-1 and Flk-1/KDR [4]. - Fl cytokine receptor Flk- 
2/Flt-3 (5J. - The putative receptor Flt-4 [7J. a signature pattern Vtes developed for this class of RTICs whk*i is based 
on a conserved region in the kinase domain. 
[1249] Consensus pattern: G-x-H-x-N-[U VM]-V-N-L-L-G-A-C-T- 

[ 1] Yarden Y, Ullrich A. Annu. Rev. Biochem. 57:443-478(1988). 

[ 2] Hunkapiller T. Hood L Adv. Immunol. 44:1-63(1989). 

[ 3] Lee K.-H.. Bowen-Pope D.F., Reed R.R. MoL Cell. BkM. 10:2237-2246(1990). 

[4] Terman B.I.. Dougher-Vermazen M., Carrion M.E., Dimitrov D.. Armellino D.C., Gospodarowicz D.. Boehlen 
R Biochem. Bbphys. Res. Commun. 187:1579-1586(1992). 

[ 5] Lyman S.D., James L., Vanden BosT. de Vrles R. Brasel K.. GliniakB., Hollingsworth LT, Pfcha K.S., McKenna 
H.J., Splett R.R. Cell 75:1157-1167(1993^ . 

[ 6J Galland F., Karamysheva A.. Pebusque I^.J., Borg J.R, Rottapel R.. Dubreuil P.. Rosnet O., Bimbaum D 
Oncogene 8:1233-1240(1993). 

[1250] Receptor tyrosine kinase class V signatures 

A number of growth factors stimulate mitogenesis by interacting with a familyof cell surface receptors which possess 
an intrinsc, ligand-sensitive, protein tyrosine kinase activity [1]. These receptor tyrosine kinases (RrK)all share the 
same topology: an extracellular ligand-binding domain, a single transmembrane regkxi and a cytoplasmki kinase do- 
main. However they can be classified into at least five groups on the basis of sequence similarities. The extracellular 
domain of class V RTK's consist of a regkxi of about 300amino acids, amongst which 1 6 consented cysteines probably 
Involved In disulfide bonds; this region is followed by two copies of a fibronectin typelll domain. The ligands for these 
receptors are proteins of about 200 to 300resldues collectively known as Ephrins. The receptors currently known to 
belong to class V are [2,3,E1]: - EPHAl (Eph-1; Esk). - EPHA2 (Eck; Mpk-5; Sek-2). - EPHA3 (Etk-1; Hek; Mek4; 
' Tyro4; Rek4; Cek4). - EPHA4 (Sek; Hek8; Mpk-3; Cek8). - EPHA5 (Ehk-1; Hek7; Bsk; Cek7). - EPHA6 (Ehk.2). - 
EPHA7 (Ehk-3: Hekll; Mdk-1; Ebk). - EPHA8 (Eek). - EPHB1 (Eph-2: Elk; Net). - EPHB2 (Eph-3; HekS; Drl; Erk; Nuk; 
Sek-3; Cek5; Qek5). - EPHB3 (Hek-2; Mdk-5). - EPHB4 (Htk; Mdk-2; Myk-1). - EPHB5 (Cek9).The EPHA subtype 
receptors bind to GPI-anchored ephrins while the EPHB subtype receptors bind to type-l membrane ephrins. Two 
signature patterns were developed for this class of RTICs. which each include some of the consen/ed cysteine reskJues. 
[1251] Consensus pattern: F-x-|DNl-x-[GAWl-[GAJ-C-[LIVM]-[SAJ-ILIVM](2)-lSA]-[LV]-[KRHQ]-(LIVA)-x(3)-[KRl-C- 
[PSAW] (The two C's are probably involved in disulfide bonds] 

Consensus pattern: C.x(2)-[DE]-G-IDEQ]-W-x(2.3)-IPAQ]-[LIVMT]-[GT]-x-C-x-C-x(2)-G-lHFY]-[EQl [The three C's are 
probably invoh^ed in disulfide bonds] 

1 1] Yarden Y, Ullrich A. Annu. Rev. Biochem. 57:443-478(1988). 

[ 2] SajjadI FG., Pasquale E.B., Subramani S. New Biol. 3:769-778(1991). 

1 3] Wteks I.R, Wilkinson D.. Salvaris E., Boyd AW. Proc. Natl. Acad. Sci. U.S.A. 89:1611-1615(1992). 

[1252] 459. Protein kinase C terminal domain 
[1253] 460. Plant thionins signature 

Thionins are small, basic, plant proteins generally toxc to animal cells [1].They seem to exert their toxk: effect at the 
level of the cell membrane but their exact function is not known. They consist of a polypeptWe chain of forty five to fifty 
amino acids with three to four internal disulfide bonds. They are found in seeds but also in the cell wall of leaves [2]. 
Thionins are processed from larger precursor proteins [3]. Crambin [4], a hydrophobic plant seed protein, also belongs 
to this family The pattern to detect this family of proteins includes three of the six cysteine residues involved in disulfide 
bonds. + hl+ 
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served catalytic core common toboth serineAhreonino and tyrosine protein Idnases. There are a numi>er ofconserved 
regions m the catalytic domain of protein kinases. Two of these regions were selected to build signature patterns The 
first region, which is located in the N-tenninal ndremity of the catalytic domain. Is a glycine-rich stretch of residues in 
the vicinity of a lysme residue, which has been shown to be involved in ATP baiding. The second region, which is 
located in the central part of the catalytic ctomain. contains a consereed aspartic acid residue which is important for 
the catalytic activity of the enzyme [6]; Two sgnature patterns were derived for that region: one specific for serine^ 
threonms kinases and the other for tyrosine kinases. A profile was also developed which is based on the aBonment in 
[1 ] and covers the entire catalytic domain. 

[12431 Consensus pattern: IUVH3-{P)-GMPHFYWM6STNHHSGAI-{PW)-{LIVCAT]-pD)-x- fGSTACLIVMFYT-x 
(5.18HUVMFYWCSTARHAIVPHUVMFAGCKRhK (K binds ATP]. The majoljr of knoU, p/otein kinases belong\o 
the class detected by this pattem, but it fails to find a number of them, especially viral kinases which are quite diveroent 
in this region and are completely missed by this pattern. 

Consensus pattern: IUVMFYChx4HYhx-CHUVMI^K-x(2)-N-[UVIWFYCTl(3) [D is an active site residue). Most ser- 
ine/ threonne specific protein kinases betong to this class detected by the pattern with 10 exceptions (ha« of them viral 
kinases) and also Epstein-Barr virus BGLF4 and Drosophila ninaC which have respectively Ser and Arg instead of the 
consenred Lys and which are therefore detected by the tyrosine kinase specific pattern described betow 

[l^^l P^"^^"- tUVMf^C]-x4HYl-x-£HUVMFYHRSTACJ-x(2)-N-(UVMFYCJ(3) [O is an active site res- 

idue] ALL tyrosine specific protein kinases with the exceptbn of human ERBB3 and mouse bik belong to this class 
detected by the pattern. This pattern wiV also detect most bacterial aminoglycoskfe phosphotransferases [8 9] and 
herpesvinjses gangcclovir kinases (10]; whk:h are proteins slnicturally and evolutionary related to protein kinases 
This profile also detects receptor guanylate cyclases and 2-5A-dependent ribonucleases. Sequence similarities be- 
tween these two families and the eukaryotic protein kinase family have been notfced before. It also detects Arabidopsis 
haliana kinase- like protein TMKL1 which seems to have k>st its catalytic activity. If a protein analyzed includes the 
two protein kinase signatures, the probability of it being a protein kinase is ck>se to 100%. Eukalyotfc^ype protein 
kinases have also been found in prokaryotes such as Myxococcus xanlhus [11] and Yersinia pseudotuberculosis. 

1 1] Hanks S.K.. Hunter T FASEB J. 9:576-596(1995). 

[ 2] Hunter T. Meth. Enzymol. 200:3-37(1991). 

[ 3] Hanks S.K., Quinn A.M Meth. Enzymol. 200:38-62(1991). 

[ 4] Hanks S.K. Curr. Opin. Struct Biol. 1:369-383(1991). 

[ 5] Hanks S.K.. Quinn A.M.. Hunter T. Science 241:42-52(1968). 

Lol'^."J?!'i^« " ' '^'^ ^"°"9 Tay'of S.S.. Sowadski J.M. Science 253- 

407-414(1991). 

[ 7] Bairoch A., Claverie J.-M. Nature 331 :22(1988). 

( 8] Benner S. Nature 329:21 -21 (1 987). 

I 9] Kirby R. J. Mol. Evol. 30:489-492(1992). 

[10] Littler E.. Stuart A.D.. Chee M.S. Nature 358:160-162(1992). 

[11] Munoz-Dorado J., Inouye S.. Inouye M. Cell 67:995-1 006(1 991V 

[1 245J Receptor tyrosine kinase class II signature 

A number of growth factors stimulate mitogenesis by interacting with a familyof cell surface receptors which possess 
an intrinsc, ligand-sensitive. protein tyrosine kinase activity [1J. These receptor tyrosine kinases {RTK)all share the 
same topology: an extracellular ligand-binding domain, a single transmembrane regkm and a cytoplasmfc kinase do- 
mairi. However they can be classified into at least five groups. The prototype for class II RTK's is the insulin receptor 
a heterotetramer of two alpha and two beta chains Rnked by disulfide bonds. The alpha and beta chains are cleavaoe 
products of a precursor molecule. The alpha chain contains the ligand binding site, the beta chain transverses the 
memt>rane and contains the tyrosine protein kinase domain. The receptors currently known to belong to class II are- 
n^T""^^^"' vertebrates. - Insulin growth factor I receptor from mammals. - Insulin receptor-related receptor 
IRR), which IS most probably a receptor for a peptMe bekxiging to the insulin family. - Insects insulin-like receptors - 
Mo^^uscan insuhn-related peptkle(s) receptor (MIP-R). - Insulin-like peptide receptor from Branchiostoma lanceolatum 
- The Drosophib devetopmental protein sevenless. a putative receptor lor positional informatcn required for the lor- 
mation of ttie R7 photoreceptor cells. - The trk family of receptors (NTRK 1 , NTRK2 and NTRK3). which are high affinity 
receptors for nerve growth factor and related neurotrophic factors (BDNF and NT-3). And the foltowing uncharacterized 
receptors. - ROS. - LTK (TYK1). - EDDR1 (cak. TRKE. RTK6). - NTRK3 (TyrolO. TKT). - A sponge putative r^e^r 
tyrc^me kinase. While only the insulin and the insulin growth factor I receptors are known to exist in the letrameric 
conforrration specific to class II RTK's, all the above proteins share extensive homologies \n their kinase domain 
^ecialV around the putative site of autophosphorylatfon. Hence, a signature pattern was devetoped for this class of 
M I K s. Which includes the tyrosine reskiue. itself probably autophosphorylated. 
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[ 1) Dawson J.H. Science 240:433-439(1988). 

1 2} Kimura S., Ikeda-Saito M. Protehs 3:113-120(1988). 

( 3] Henrissat B.. Salohelmo M., Lavartte S.. Knowles JX.C. Proteins 8:251-257(1990). 
[ 41 Welinder K.G. Biochim. Blophys. Acta 1080:215-220(1991). 

[1236] 455. pfkB family of carbohydrate kinases signatures 

It has been shown (l»2.3] that the following carbohydrate and purine kinasesare evolutionary related and can be 
grouped into a single family, which Isknown [1 J as the 'pfkB family*: - Fructokinase (EC 2,7.1.4 ^ (gene scrK). - 6-phos- 
phof ructokinase isozyme 2 (EC 2.7.1.11) (phosphofructokinase-2) (gene pfkB). pfkB Is a minor phosphof ructokinase 
isozyme in Escherchia coli and is not evolutionary related to the major isozyme (gene pfkA). Plants 6-phosphofruc- 
tokinase also belong to this family. - Ribokinase (EC 2.7.1.15) (gene rbsK). - Adenosine kinase (EC 2.7.1.20) (gene 
ADK). - 2-dehydro-3Kieoxygluconokinase (EC 2.7. 1.45 ) (gene: kdgK). - 1 -phosphof ructokinase (EC 2.7. 1.56) (fructose 
1 -phosphate kinase) (gene f ruK). - Inosine-guanosine kinase (EC 2.7.1.73 ) (gene gsk). - Tagatose-6-phosphate kinase 
(EC 2.7.1.144) (phosphotagatokinase) (gene lacC). - Escherichia coli hypothetical protein yeiC. - Escherichia coli hy- 
pothetical protein yeil. - Escherrchia coli hypothetical protein yhfQ. - Escherichia coli hypothetk^l protein yihV - Bacillus 
subtilis hypothetk:al protein yxdC. - Yeast hypothetical protein YJR105w.All the above kinases are proteins of from 280 
to 430 amino acid residues that share a few regwn of sequence similarity. Two of these regions were selected as 
signature patterns. The first pattern is based on a region rich in glycine which is kjcated in the N-terminal section of 
these enzynnes; while the second pattern is based on a conserved region in the C-temiinal section. 
[1 237] Consensus pattern: [AG]-G-x(0. 1 )-lG AP]-x-N-x-[STA]-x(6)-[GSI-x(9)-G- 
Consensus pattern: [DNSK)-[PSTV]-x-[SAG){2)-[GD]-D-x(3)-[SAGVJ-[AG]-[LIVMFYAl-[LIVMSTAP] 

[ 1] Wu L-F.. Reizer A, Reizer J.. Cal B.. Tomteh J.M., Saier M.H. Jr. J. Bacteriol. 173:3117-3127(1991). 
[ 2] Orchard L.M.D.. Kornberg H.L. Proc. R. Soc. Lond., B. Biol. Sci. 242:87-90(1990). 
[ 3] Blatch G.L. Scholle R.R., Woods D.R Gene 95:17-23(1990). 

[1238] 456. Phospholipase A2 active sites signatures 

Phospholipase A2 (EC 3.1.1.4 ) (PA2) [1 ,2] is an enzyme which releases fatty acids from the second carbon group of 
glycerol. PA2's are small and rigid proteins of 120 amino-acid residues that have four to seven disulfide bonds.PA2 
binds a calcium ion whfch is required for activity. The side chains of two consen/ed residues, a histidine and an aspartic 
acid, participate in a 'catalytk: network*. Many PA2*s have been sequenced from snakes, lizards, bees and mammals. 
In the latter, there are at least four forms: pancreatic, membrane-associated as well as two less characterized forms. 
The venom of most snakes contains multiple forms of PA2. Some of them are presynaptic neurotoxins which inhibit 
neuromuscular transmission by blocking acetyteholine release from the nerwe termini. Two different signature pattems 
were derived for PA2's. The first Is centered on the active site histkJine and contains three cysteines involved in disulfide 
bonds. The second is centered on the active site aspartic acid and also contains three cysteines involved in disulfide 
bonds. 

[1239] Consensus pattern: C-C-x(2)-H-x(2)-C [H is the active site residue] This pattem will not detect some snake 
toxins homofogous with PA2 but which have lost their catalytic activity as well as otoconin-22. a Xenopus protein from 
the aragonitic otoconia which is also unlikely to be enzymatically active. 

Consensus pattem: [LIVMAl-C-{LlVMFYWPCST)-C-D-x(5)-C [D is the active site residue] The majority of functional 
and non-functional PA2's. Undetected sequences are bee PA2, gila monster PA2's. PA2 PL-X from habu and PA2 PA- 
5 from mulga. 

[ 1) Davidson FF., Dennis E.A. J. Mol. Evol. 31:228-238(1990). 

1 2J Gomez F. Vandermeers A., Vandermeers-Piret r^.-C., Herzog R., Rathe J., Stievenart M.. Winand J.. Chris- 
tophe J. Eur. J. Biochem. 186:23-33(1989). 

[1240] 457. Phosphorylase pyrldoxal-phosphate attachment site. Phosphorylases (EC 2.4.1.1 ) [1] are important al- 
losteric enzymes in carbohydrate metabolism. They catalyze the formation of glucose 1 -phosphatef rom polyglucose 
such as glycogen, star<^ or maftodextrin. Enzymes from different sources differ in their regulatory mechanisms and 
their natural substrates. However, all known phosphorylases share catalytic and structural properties. They are pyrl- 
doxal-phosphate dependent enzymes; the pyridoxal-P group Is attached to a lysine residue around which the sequence 
is highly conserved and can be used as a signature pattem to detect this class of enzymes. 
[1241] Consensus pattem: E-A-|SC]-G-x-[GS]-x-M.K-x(2)-[LM)-N (K is the pyridoxal-P attachment sitel« 
[ 1] Fukul T, Shimomura S., Nakano K. Mol. Cell. Biochem. 42:129-144(1982). 
[1242] 458. Protein kinases signatures and profile 

Eukaryolic protein kinases [1 to 5) are enzymes that belong lo a veiy extensive family of proteins which share a con- 
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proline residue. Proline dipeptidase(EC 3.4.13.9) {profidase) splits dipeptides with a prolyl residue in the cartx)xyl 
terminal position. Bacterial aminopeptidase P II (gene pepP) [1 J. proline dipeptictese (gene pepQX2]/and human proline 
dipeptidase (gene PEPD) (3] are evolutionary related. These proteins are nianganese metalloenzymes. Yeast hypo- 
thetical proteins YER078C and YFFIOOSw and Mycobacterium tuberculosis .hypothetical protein MtCY49.29c also be- 
long to this family. As a signature pattern for these enzymes a conserved region was selected that contains three 
histidine residues. 

[1 231] Consensus pattern: IH AHGSYRHU VMTHSGhH-x-lU VJ-G-fU VM]-x-ll VJ-H-lDEh 

[ 1] YoshimotoT, Tone H.. Honda T, Osatomi K., Kobayashi Fl, Tsuru D. J. Biochem. 105:412-416(1989). 
1 2) Nakahigashi K., Inokuchi H. Nucleic Acids Res. 18:6439-6439(1990). 

[ 3) Endo R. Tanoue A.. Nakai H., Hata A.. Indo Y. Titani K.. Matsuda I. J. Biol. Chem. 264:4476-4481(1989) 
[ 4J Rawllngs N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

[1232] Methionine aminopeptidase signatures (pep2) 

Methionine aminopeptidase (EC 3.4.1116) (MAP) is responsible lor the removal oT the amino-terminal (initiator) me- 
thionine from nascent eukaryotic cytosolic and cytoplasmic prokaryotic proteins if the penultimate amino ackJ Is small 
and uncharged. All MAP studied to date are monomeric proteins that require cobalt ions for activity Two subfamilies 
of MAP enzymes are known to exist [1,2]. While being evolutionary related, they only share a limited amount of se- 
quence similarity mostly clustered around the residues shown, in the Escherichia coli MAP [3],to be involved in cobalt- 
binding. The first family consists of enzymes from prokaryotes as well as eukaryotc I^P-1, while the secor>d group 
is nnade up of archebacterial MAP and eukaryotic MAP-Z The second subfamily also includes proteins whfch do ncrt 
seem to be MAP, but that are clearly evolutkxiary related such as mouse proliferation-associated protein 1 and fission 
yeast cun^ed DNA-binding protein. For each of these subfamilies, a specific signature pattern was developed that 
includes reskiues known to be involved in colbalt-binding. 

[1233] Consensus pattern: [MFY]-x-G-H-G-ILIVMCJ-[GSH]-x(3)-H-x(4)-{LIVM].x-tHN]-{YWV] [H is a cobalt ligandl- 
Consensus pattern: [DA]-{LI VMYJ-x-K-IUVM]-D-x-G-x-IHQHUVM]-IDNSJ-G-x(3)-[DNJ [The second D and the last D/ 
N are cobalt ligands] 

1 11 Arfm S.M.. Kendall R.L, Hall L, Weaver LH.. Stewart A.E., Matthews B.W., Bradshaw R.A Proc Natl Acad 
Sci. U.S.A. 92:7714-7718(1995). 

( 2] Keeling P. J.. Doolittle WF. Trends Biochem. Sci. 21:285-286(1996). 
[ 3] Roderick S.L., Mathews B.W. Biochemistry 32:3907-3912(1993). 
1 4J Rawlings N.D., Barrett A. J. Meth. Enzymol. 248:183-228(1995). 

[1234] 454. Peroxkiases signatures 

Peroxidases (EC 1.11.1.-) [1] are heme-binding enzymes that carry out a variety of brasynthetlc and degradative func- 
tkDns using hydrogen peroxide as the electron acceptor Peroxidases are widely distributed throughout bacteria, fungi, 
plants, and vertebrates. In peroxidases the heme prosthetic group is protoporphyrin IX and the fifth ligand of the heme 
rron is a histidine (known as the proximal histidine). Another histkJine reskiue (the distal histkJine) serves as an ackJ- 
base catalyst in the reactk>n between hydrogen peroxkJe and the enzyme. The regions around these two active site 
residues are more or less conserved in a majority of peroxklases (2.3]. The enzymes in whkrfi one or both of these 
regions can be found are fisted below. - Yeast cytochrome c peroxidase (EC 1.11.1.5) . - Myeloperoxidase (EC 1.11.1.7 ) 
(MPO). MPO is found in granulocytes and monocytes and plays a major role In the oxygen-dependent mrcrobicidal 
system of neutrophils. - Uctoperoxidase (EC 1.11.1.7 ) (LPO). LPO is a milk protein which acts as an antimicrobial 
agent - Eosinophil peroxkiase (EC 1.11.1.7) (EPO). An enzyme found in the cytoplasmk: granules of eosinophils - 
Thyroid peroxidase (EC 1.11.1.8) (TPO). TPO plays a central role in the bbsynthesis of thyroid hormones. It catalyzes 
the kxJination and coupling of the homnonogenk: tyrosines in thyrogtobulin to yield the thyroid hormones T3 and T4 • 
Fungal ligninases. Ligninase catalyzes the first step in the degradation of lignin. It depolymerizes llgnin by catalyzing 
the C(alpha)-C(beta) cleavage of the propyl side chains of lignin. - Plant peroxidases (EC 1.11.1.7 ). Plants expresses 
a large numbers of isozymes of peroxidases. Some of them play a role in cell-suberization by catalyzing the deposition 
of the aromatic residues of suberin on the cell wall, some are expressed as a defense response toward wounding, 
others are involved in the metabolism of auxin and the biosynthesis of lignin. - Prokaryotic catalase-peroxidases. Some 
bacterial species produce enzymes that exhibit both catalase and broad^pectrum peroxidase activities [4]. Examples 
of such enzymes are: catalase HP I from Escherichia coli (gene katG) and perA from Bacillus slearothermophilus 
[1235] Consensus pattern: (DETHLIVMTAl-x(2)-[UVMl-{LIVMSTAG}-[SAG]-[LIVMSTAG ]-H- ISTA]-(UVMFY1 (H is 
the proximal heme-binding ligand] - 

Consensus pattern: (SGATVl.x(3).(LIVMAhR-[LIVMA]-x4FW]-H-x-(SAC] (H is an active site residue]- 
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1587- 1643(1 995). [3] Shi G.-P., Chapman H.A., Bhairi S.M.. Deleeuw C, Reddy VY.. Weiss S.J. FEBS Lett. 357* 
129-1 34(1 995). [4] Velasco G., Ferrando A.A., Puente X.S.. Sanchez LM.. Lopez-ain C. J. Bfol. Chem 269* 
271 36-27142(1 994).l 5] Chapol-Charlier M.P.. Nardi M., Chopin M.C., Chopin A., Gripon J.C. Appl. Environ Microbiol 
59:330-333(1 993).[ 6} Higgins D.G., McConnell D. J., Sharp RM. Nature 340:604-604(1 989).[ 7) Rawlings N D Barrett 
A J. Meth. Enzymol. 244:461 -486(1 994). 

[1220] 450. (peptidase M24) Aminopeptidase P and proline dipeptidase signature (1 ). 

Aminopeptldase P (EC 3.4.11.9) is the enzyme responsible for the release of any N-temfiinal amino acid adjacent to a 
proline residue. Proline dipeptidase(EC 3.4.13.91 (prolidase) splits dipeptides with a prolyl residue in the carboxyl 
terminal position. Bacterial aminopeptidase P II (gene pepP) (1 ). proline dipeptidase (gene pepQ)[21. and human proline 
dipeptidase (gene PEPD) [3] are evolutionary related. These proteins are manganese metalloenzymes. Yeast hypo- 
thetical proteins YER078c and YFR006w and Mycobacterium tuberculosis hypothetical protein MtCY49.29c also be- 
long to this family. As a signature pattern for these enzymes a consen/ed region that contains three histidine residues 
has been developed 

[1221] Consensus pattern: IHA]-[GSYRHUVMT|-ISGJ-H-x-(UVJ-G-IUVMl-x-[IVJ-H-[DE)- 

[1222J [ 1] Yoshimoto T, Tone H.. Honda T. Osatomi K., Kobayashi R., Tsuru D. J. Biochem. 105:412-416(1989). 
{ 2] Nakahigashi K., InokuchI K Nucleic Acids Res. 18:6439-6439(1 990). [ 3] Endo P., Tanoue A., Nakai H., Hata A. 
Indo Y, Trtani K.. Malsuda I. J. Bbl. Chem. 264:4476-4481 (1989). [4] Rawlings N.D.. Barrett A. J. Me\h Enzvmol 248' 
183-228(1995). ' 

[1 223] Methionine aminopeptidase signatures. (2). Methionine aminopeptidase (EC 3.4.11.18 1 (MAP) is responsible 
for the removal of the amino-temninal (initiator) methionine from nascent eukaryotic cytosolic and cytoplasmic prokary- 
otic proteins if the penultimate amino acid is small and uncharged. AH MAP studied to date are monomeric proteins 
that require cobalt ions for activity. Two subfamilies of MAP enzymes are known to exist (1.2]. While being evolutionary 
related, they only share a limited amount of sequence similarity mostly clustered around the residues shown, in the 
Escherichia coll MAP [SJ.to be involved in cobalt-binding. The first family consists <rf enzymes from prokaryotes as well 
as eukaryoticMAP-1 , while the second group is made up of archebacterial MAP and eukaryotk:MAP-2. The second 
subfamily also includes proteins whfch do not seem to be MAP, but that are clearly evolutionary related such as mouse 
proliferation-associated protein 1 and fissfon yeast cunfed DNA-bindIng protein. For each of these subfamilies, a spe- 
cific signature pattern that includes residues known to be Involved in colbalt-blnding has been devetoped 
[1 224] Consensus pattern: [MFYJ.x-G-H-G.[LI VMC]-[GSH)-x(3)-H-x(4)-[LI VM]-x-[HN]-[YWVI [H is a cobalt ligand]- 
Consensus pattern: [DA]-ILIVMYhx-K-[UVMJ-D-x-G-x-[HQl-[LIVM]-[DNSJ-G-x(3)-[DN] [The second D and the last D/ 
N are cobalt ligands] 

[1225] [ 1J Arfin S.M., KendaO RL. Hall L. Weaver LH., Stewart A.E.. Matthews B.W., Bradshaw R.A. Proc Natl 
Acad. Sci. U.S.A. 92:771 4-7718(1 995). [ 2) Keeling PJ., Doolittle WR Trends Biochem. Scl. 21:285-286(1996) [ 3) 
Roderick S.L, Mathews B.W. Biochemistry 32:3907-3912(1 993).J 4] Rawlings N.D., Barrett A. J. Meth Enzvmol 248' 

183-228(1995). 

[1 226] 451 Cytochrome P450 cysteine heme-iron ligand signature 

Cytochrome P450's [1 .2.3£1] are a group of enzymes involved in the oxidative metabolism of a high number of natural 
compounds (such as steroids, fatty acids, prostaglandins, leukotrienes, etc) as well as drugs, carcinogens and muta- 
gens. Based on sequence similarities. P450*s have been classified Into about forty different families [4,5]. P450's are 
proteins of 400 to 530 amino acids; the only exceptbn Is Bacillus BM-3 (CYP102) which Is a protein of 1048residues 
that contains a N-termlnal P450 domain followed by a reductase domain. P450's are heme proteins. A consented 
cysteine residue in the C-temiinal part of P450's is involved in binding the heme iron in the fifth coordination site. From 
a region around this residue, a ten residue signature was developed specific to P450's. 
[1227] Consensus pattern: IFWl-ISGNH]-x-[GDI-x-(RHPT>x-C.[LIVMFAP]-[GADJ [C is the heme iron ligand]- 

[ 1) Nebert D.W., Gonzalez RJ. Annu. Rev. Biochem. 56:945-993(1987). 

[ 2) Coon M.J., Ding X.. Pernecky S.J.. V^z A.D.N. FASEB J. 6:669-673(1992). 

[ 3] Guengerich FP J. Biol. Chem. 266:10019-10022(1991). 

[ 4] Nelson D.R., Kannataki T. Waxman D.J.. Guengerich FR. Estrabrook R.W.. Feyereisen R.. Gonzalez F J.. 
Coon M.J., Gunsalus I.C.. Gotoh O.. Okuda K.. Nebert D.W. DNA Cell Biol. 12:1-51(1993). 
[ 5) Degtyarenko K.N., Archakov A.I. FEBS Lett. 332:1-8(1993). 

[1 228] 452. (Pec Lyase) Pectate lyase 

This enzyme forms a right handed beta helix structure. Pectate lyase is an enzyme involved in the maceration and soft 
rotting of plant tissue. 

[1229] [1] Yoder MD, Keen NT. Jumak R Science 1993;260:1503-1507. 

[1230] 453. (pep M24) Aminopeptidase P and proline dipeptidase signature (pepi ) 

Aminopeptidase P (EC 3.4.11.9) is the enzyme responsible for the release of any N-termlnal amino acid adjacent to a 
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proluding in the penplasmic space. Two residues have been shown [2.31 to be essential for the cataMie activitv of 
SPase I: a serine and an lysine. SPase I is evolutionary related to the yeast mitochondrial inner membrane protease 
subunrt 1 and 2 (genes IMP1 and IMP2) which catalyze the remcvai of signal peptides required for the targetinq of 
proteins from the mitochondrial matrix, across the inner membrane, into the inter-membrane space 14] In eukaryotes 
the removal of signal peptides is effected by an oligomeric enzymatic complex composed of at least five subunits- the 
signal peptidase complex (SPC). The SPC is located in the endoplasmic reticulum membrane Two comoonents of 
mammalian SPC. the 18 Kd (SPC18) and the 21 Kd (SPC21 ) subunits as weU as the yeast SEC11 subunit have been 
shown [5] to share regions of sequence similarity with prokaryotic SPases I and yeast IMP1/IMP2 Three sionature 
patterns have been developed for these proteins. The first signature contains the putative active site serine the second 
signature contains the putative active site lysine which is not consented in the SPC subunits. and the thild sionature 
TJ^^^ ^ consenred region of unknown biological significance which is located in the C-terminal secti«i of all 

[121 4] Consensus pattern: (6S}-x-S-M-x-[PS)-[ATHUn (S is an active site residue]- 

Consensus pattern: K-R-[UVMSTAK2H3-x4PG}-64DEhx-{UVM)-x4UVMFY] [K is an active site residuel- 

Consensus pattern: [UVMFYWl(2)-x{2)-G-r>{NHl-x(3)-[SND]-x(2HSG]- 

I1215J [ 1] Dalbey RE., von Heijne G. Trends Biochem. Sci. 1 7:474-478(1 992).r2J Sung M Dalbev FLE J Biol 
Chem. 267:13154-13159(1992).{ 3J Black M.T J. Bacterk>L 175:4957-4961(1993 .[ 4] Nun^ri J fSd W^ter^ 

281 9-2828(1 992).( 6J Rawfmgs N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994) (E1J 

[1216J 449. (PeptidaseCI) Eukaryoticthk)l (cysteine) proteases active sites. EukaryoUc thiol proteases (EC 3 4 22 -) 
til are a family of proteolytic enzymes which contain an active site cysteine. Catalysis proceeds through a thtoeste 
uMemiediato and is facilitated by a nearby histidine side chain; an asparagine completes the essential ^talytic triad 

^JTT^ ^'^ """^"^^^ '° '° '^'^ '^"y (references are only proved for 

recently determined sequences). - Vertebrate lysosomal cathepsins B (EC 3.4 221) H (EC 34^16^ L iPr 

SKrff!- TJ£^.rf^B ;?or "^"^ peptidaseMiHtilM is^^^^ ^ he 
Zl I^^IJZ- TT.^^ ECM22JB Calpahs are intracellularcateium- activated thiol protease that contain 
both a N-tem^inal ratalytic domam and a C-lem,inal calcium-binding domain. - Mammalian cathepsin K which seem^ 

SltSiLtf^^S?" ?r ' J- - -t^-f-- O [41. - Bleomycin hydrolase. An enzyme tS c^S^Ts 

me inactivatio. of the antitumor drug BLM (a glycopeptide). - Plant enzymes: barley aleurain (EC 3.4.22. 16> . EP-B1/B4- 

^ • ""^ "^^^ (EC 34.22.14 >: papaya latex pa^in 

papa,n EC 34J2J). caricain (EC 3.4.22.30). and proteinaselT^i^M^; pea tur^-resp«;ii;^IS;S 
fnS Ai!:i^°"t'r ^^'^M^^ 'apeCOT44: riceo,yzainal,Si:b5i:'aiigam^: tonS^o^SS^^rratu^ 
B-like proteinases from the worms Caenorhabditis elegans (genes gcp-l . cpr-3. cpr-4. cpr-5 and cpr-6) SchistosSna 
^CPl^dCP^sr^^ 

oslertag, (CP-1 and CP-3). - Sl^e mold cysteine proteinases CP1 and CP2. - Cnjzipain from Trypanosoma cnjzi ^ 
brucei. - Throphozone cysteine proteinase (TCP) from various Plasmodium species. - P«,teas^ from LeS^^^ 
mexicarja. The.leria annulata and Theileria pan«. - Bacutoviruses cathepsin-like enzyme (v-cath) - D^ht3 

BKl/?cP1/E2>r TL^- T"^' ""^'^ ^ calpain-likeX,!. - Yeas. Z^oteS 

BLH1/YCP1/LAP3. - Caenorhabditis elegans hypothetical protein C06G4.2. a calpain-like protein. Two bactertel peoti- 
dases are also part of this family: - Aminopeptidase C from Lactococcus lactis (gene pepC) (51 -^iJprotiS^ tor 

proteolytic act.vjy. - Soybean oil body protein P34. This protein has its active site cysteine replaced by a glyci^ Z 
testin. a Sertoli ceU secretory protein highly similar to cathepsin L but with the a<^ive site ^steine ?s reS bv a 
serine^, ,es ,n should not be confused with mouse testin which is a UWWomain protein (see <^SSS 
Ptesmodium falc|parum serine-repeat protein (SERA), the major bkxxf stage antigen. This protein of 111 Kd posse ss 

the three active site residues are well conserved and can be used as signature pattems 

[1217] Consensus pattern: Q-x(3)-(GE]-x-C-(YW]-x(2)-tSTAGC]-[STAGCV] [C is the active site residue]- Note- the 
residue .n posrt«n 4 of the pattern fe almost ahiiays cysteine; the only excejlions are calpains (LeTbleomylh h^ 

Se]-^*^"*"^ f-'^'^''^T'^'^l-''-"-|GSACEJ-[LIVI^-x-[LIVMATl(2)-G-x-lGSADNH] (H is the active site 

Consensus pattern: (^FYCHHWI]-[U\ni.x-IKRQAGJ-f^[STl-W-x(3)-[I^W^^^ 

S peXses P.S ' belong tofamily CI (papain-type) and C2 (cajains) in th J clisSi!;;; 

[12191 1 1] Dulour E. Biochimie 70:1335-1342(1988).[ 2] Kirschke H., Barrett A.J.. Rawlings N.D. Protein Prof. 2: 
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[1202J Note: these proteins belong lo fanriilies S9A/S9B/S9C In the classification of peptidases [4]. 

1 1] Rawlings N.D.. Polgar L. Barrett A.J. Biochem. J. 279:907-911(1991). 
1 2] Barrett A.J.. Rawlings N.D. 
[3] Polgar L, SzaboE. 

(41 Rawlings N.D., Barrett A. J. Meth. EnzymoL 244:19-61(1994). 



[1203] 445. (Pterin 4a) 

Pterin 4 alpha carbinolamine dehydratase 

[1204] Pterin 4 alpha carbinolamine dehydratase is aka DCoH (dimerisation cofactor of hepatocyte nuclear factor 
1 -alpha). 

[1 205] Number of members: 1 1 

[1206] [1 J Cronk JD. Endrizzi JA, Alber T; Medline: 97052967 High-resolution structures of the bifunctional enzyme 
and transcriptional coactivator DCoH and its complex with a product analogue." Protein Sci 1996;5* 1963-1 972 
[1207] 446. (Pyridox oxidase) 
Pyridoxamine 5*-phosphate oxidase signature 

[1208] Pyridoxamine 5'-phosphate oxidase (EC 1.4.3.5) is a FMN flavoprotein involved in the de novo synthesis of 
pyrldoxine (vitamin 86) and pyridoxal phosphate. It oxidizes pyridoxamine-5-P (PMP) and pyridoxine-5-P (PNP) to 
pyridoxal-6-R The sequences of the enzyme from bacterial (genes pdxH or f pr A) [ 1 ] and fungal (gene PDX3) [2] sources 
show that this protein has been highly conserved throughout evolutbn. 

PdxH is evolutk>nary related [3] to one of the enzymes in the phenazine biosynthesis protein pathway, phzD (also 
known as phzG). As a signature pattem. a highly conserved region was selected kx:ated in the C-terminal part of these 
enzymes. 

- Consensus pattem: [LIVF]-E-F-W-[QHGJ-x(4)-R-[LIVM]-H-[DNE]-R 
[ 1] Lam H.-M., Winkler M.E. J. Bacteriol. 174:6033-6045(1992). 

[ 2] Loubbardi A., Karst R, Guiltoton M., Marcireau C. J. Bacterbl. 177:1817-1823(1995). 
1 3J Pierson LS. III. Gaffney T.. Lam S.. Gong F. FEMS Microbiol. Lett 134:299-307(1995). 

[1209] 447. (Pyrophosphatase) 
Inorganic pyrophosphatase signature 

[1210] Inorgank: pyrophosphatase (EC 36.1.1) (PPase) [1 .2] is the enzyme responsible for the hydrolysis of pyro- 
phosphate (PPi) which is fomried principally as the product of the many bk>synthetk: reactions that utilize ATP. All known 
Ppases require the presence of divalent metal cations, with magnesium conferring the highest activity Among other 
residues, a lysine has been postulated to be part or close to the active site. PPases have been sequenced from bacteria 
such as Escherichia coli (homohexamer). thermophilic bacteria PS-3 and Thermus thermophilus, from the archaebac- 
teria ThemrK)plasma acidophilum, from fungi (homodimer), from a plant, and from bovine retina. In yeast, a mitochon- 
drial isoform of PPase has been characterized whrch seems to be involved in energy production and whose activity is 
stffnulated by uncouplers of ATP synthesis. 

[1211] The sequences of PPases share some regions of similarities. As signature patterns a region was selected 
that contains three conserved aspartates that are involved in the binding of cations. 

• Consensus pattem: D-(SGDN]-D-(PE]-[LIVMF]-D-(LIVMGACJ 
[The three D's bind divalent metal cations) 

[ 1 ] Lahti R.. Kolakowski LF. Jr.. Heinonen J.. Vihinen M., Pohjanoksa K.. Cooperman B.S. Biochim Biophys Acta 
1038:338-345(1990). 

( 2] Cooperman B.S., Baykov A. A.. Lahti R. Trends Bkxihem. Sci. 17:262-266(1992). 

[1212] 448. (Peptidase S26) 
Signal peptidases t signatures. 

[1213] Signal peptidases (SPases) [1] (aka leader peptidases) remove the signal peptides from secretory proteins. 
In prokaryotes three types of SPasesare known: type I (gene lepB) which is responsible for the processing of the 
majority of exported pre-proteins; type II (gene Isp) which only process lipoproteins, and a third type involved in the 
processing of pili subunits. SPase I (EC 3.4.21.89) is an integral membrane protein that is anchored in the cytoplasmk: 
membrane by one (in B. subtilis) or two (in E. coli) N-terminal transmembrane domains with the main part of the protein 
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In the sequence df all these enzymes there is a smail conserved region which may be involved in the enzymatic activilv 
and/or be part of the PRPP binding site [IJ. *?n^ymaiic aciivny 

- Note: in position 1 1 of the pattem most of these enzymes have Gly. 

[1196] [1) Hershey H.V., Taylor M,W. Gene 43:287-293(1986) 
[1197] 443. (Pro CA) 

ProkaryotiC'type cart>onic anhydrases signatures 

carbonic ^hydrases (EC 4L2.1.1) (CA) are zinc nr,etalloenzymes which catalyze the reversible hydratior, of carbon 
dioxKJe. in Eschenchia col,. CA (gene cynT) is involved in recycling carbon dioxide formed in the bicarbonate^ependem 
decornposrtion of cyanate by cyanase (gene cynS). By this action, it prevents the depletion ol cellular bicarbonate 111 
In phot^ynmetrc bacteria and plant chloroplast. OA is essential to inorganic carbon fixation [2]. ProKary^S pl2^t 

dmerent forms Of euteryotK. CA's (see <PDOC00146». Hypothetical proteins yadF from Escherichia coH^ hS 
from Haemophilus mfluenzae also belong to this family. Two signature patterns were developed for this ^|/rt ^ 
zymes. Both patterns contain consenred residues that could be involved ir, binding zinc (cysk^e ^hisS) 

- ' Consensus pattem: C-[SA]-D-S-R-{U VM]-x-[AP] 

- Consensus pattem; IEQhY-A-{LIVMJ-x(2)-(UVM]-x{4HUVMFK3)-x-G-H-x(2)-C-G 

1 1] Guilloton M3.. Korte J.J.. Lamblln A.F.. Fuchs J.A.. Anderson P.M. J. Biol. Chem. 267:3731-3734(1992) 
1 2J Fukuzawa H.. Suzuki E.. Komukai Y.. Miyachi S. Proc. Natl. Acad. Sci. U.S.A. 89:4437-1441 (19S). 

[11981 444. (ProlyLoligopep) 

Prolyl oligopeptklase.family serine actwe site 

[1199) The prolyl oligopeptkJase family [1 .2.3J consist of a nunrt>er of evolutionary related peptidases whose catatvtic 
activity seems to be provided by a charge relay system similar to that of the trypsin iBmy lt^X^^^ 
which evolved by ^dependent convergent evolution. The known members of thrtamBy are IbteSrekT 

- Pro^l eixtopeptid^e (EC 3.4.21 .26) (PE) (alsocalled post-proline cleaving enzyme). PE is an enzyme that cleaves 

sXt^a*:;' r^S'SiS "^^ '^"'"^^ °' obtained^:: .i^'rS^T^ 

H^ri^S, (Flavobaclenum meningosepteum and Aeromonas hydrophila); there is a high 

degree of sequence conservation between these sequences /.•■■««»» a nign 

* rT w " ^^"^ (Oligopeptidase B) (gene prtB) which cleaves peptkJe bonds on the C- 

terminal side of lysyl and argininyl resklues. >~"uauiiuiBo 

- DipeptWyl peptidase IV (EC 3.4.14.5) (DPP IV). DPP IV is an enzyme that removes N-terminal dipeptides seouen- 
telly from polypeptides having unsubstituted N-termini provkJed that the penultimate residC^ is pSe 

■ iT^:xs^or;zs^'' ' '"""^ '''''^ ^^^•^ '^^^ p--^^ - 

- Yeast vacuolar dIpeptkJyl aminopeptidase B (DPAP B) (gene- DAP2) 

' ^^!^^^^'^^^^ !^ '^-^^ (acyl-peptlde hydrolase). This enzyme catalyzes the hydrolysis 

^srSs'^'^^'^'^-^'^^'^^^ 

PP^ ^T^^'^^rT '^^''^"^ experimentally been shown (in E.coli proteasell as well as in pig and bacterial 
PE) to be necessa^ ,or the catalytK mechanism. This serine, which is pan of the calalytfc triad (Se? His A^pTt 

^^aSl^ro^eio'a^ratt^r'^"^^ 

Kr^r^^^trrgt^o^crs 
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[1190] ( 1) Villalba M.. Batanero E.» Lopez-Otin Sanchez LM.. Monsalve R.I.. Gonzalez De La Pena M.A., Lahoz 
C, Rodriguez R. Eur. J. Biochem. 216:863^9(1993). 
[1 1 91] 439. PoHen allergen 

This family contains allergens lol PI, Pll and PHI from Lolium perenne. 
5 Number of members: 49 

[11 

Medline: 90105394 

Complete primary structure of a Lolium perenne (perennial rye grass) pollen allergen. Lol p ill: comparison with known 
Lol p I and II sequences. 
10 Ansari AA, Shenbagamurthi P, Marsh DG; 
Biochemistry 1 989;28:8665-8670. 
[1 1 92] 440. Porphobilinogen deaminase cofactor-binding site 

Porphobilinogen deaminase (EC 4.3.1 .8), or hydroxymethylbilane synthase, is an enzyme involved in the biosynthesis 
of porphyrins and related macrocycles. It catalyzes the assembly of four porphobilinogen (PBG) units in a head to tail 
^5 fashion to form hydroxymethylbilane. 

The enzyme covalently binds a dipyrromethane cofactor to which the PBG subunits are added in a stepwise fashion. 
In the Escherichia coli enzyme (gene hemC), this cofactor has been shown [1] to be bound by the sulfur atom of a 
cysteine. The region around this cysteine is conserved \n porphobilinogen deaminases from various prokaryotic and 
eukaryotk: sources. 

20 

- Consensus pattern: E-R-x-(LIVMFA]-x(3)-[LIVMF]«x-G*[GSA]-C-x-[IVT]-P-ILIVMF] 
-[GSA] [C is the cofactor attachment site] 

[1193] [ 1] Miller A.D.. Hart G.J.. Packman LC. Battersby A.R. Biochem. J. 254:915-918(1988). 
25 [1194] 441.Presenilin 

Mutattons in presenilin-1 are a major cause of early onset Alzheimer's disease [2J. It has been found that presenilin-1 
( . (Swiss:P49768) binds to beta-catenin in vivo [4]. This family also contains SPE proteins from C.elegans. 
Number of members: 23 

[1] 

30 Medline: 98045995 

. Presenilins and Alzheimer's disease. 
Kim TW. Tanzi RE; 

Curr Opin Neurobiol 1 997; 7:683-688. 
[2]Medline: 98045995 
35 Presenilins and Alzheimer's disease. 
Kim TW, Tanzi RE; 

Curr Opin Neurobk>l 1 997;7:683-688. 
[3]Medline: 98099802 
Interaction of presenilins with the filamin family of actin-binding proteins. 
40 Zhang W. Han SW, McKeel DW, Goale A, Wu JY; 
J Neurosci 1998;18:914-922. 
I4]Medline: 99004850 

Destabilisation of beta-catenin by mutattons in presenilln-1 potentiates neuronal apoptosis. 

Zhang Z. Hartmann H. Do VM. Abramowski D, Sturchler-Pierrat 
45 c. Staufenbiel M, Sommer B, van de Wetering M, Clevers H, 
Saftig P, De Strooper B, He X, Yankner BA; 

Nature 1998;395:698-702. 
[1195] 442. (Pribosyltran) Purine/pyrimidine phosphoribosyl transferases signature 

Phosphoribosyltransferases (PRT) are enzymes that catalyze the synthesis of beta-n-5'-monophosphates from phos- 
50 phoribosylpyrophosphate (PRPP) and an enzyme specific amine. A number of PRT's are involved in the biosynthesis 
of purine, pyrimldine, and pyridine nucleotides, or in the salvage of purines and pyrimidines. These enzymes are: 

- Adenine phosphoribosyltransferase (EC 2.4.2.7) (APRT). which is involved in purine salvage. 

- Hypoxanthine-guanine or hypoxanthine phosphoribosyltransferase (EC 2.4.2.8) (HGPRT or HPRT), which are 
55 involved in purine salvage. 

- Orotate phosphoribosyltransferase (EC 2.4.2.10) (OPRT). which is involved in pyrimidine biosynthesis. 

- Amide phosphoribosyltransferase (EC 2.4.2.14), which is involved in purine biosynthesis. 

- Xanthine-guanine phosphoribosyltransferase (EC 2.4.2.22) (XGPRT), which is involved in purine salvage. 



central part 
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Consensus pattern: [FY}-x(2)-T-R-H-N-x-G-x(2HUVMFAK2)-[DEJ 
Consensus pattern: [GS]-x(3)4H-N-G4UVI^-|KR]-[DNSHUVMT| 



1 1] Garcia-Villegas M.R.. De U Vega F.M.. Galindo J.M.. Segura 1^. Buckingham R.H., Guameros G EMBO J 
10:3549-3555(1991). 

[ 2) De La Vega F.M., Galindo J.M., Old I.G., Guameros a Gene 169:97-100(1996). 
[ 3] Ouzounis C, Bork P.. Casari G., Sander C. Protein Sci. 4:2424-2428(1995). 

[1187] 436. (Peptidase Ml 7) Cytosol aminopeptidase signature 

Cytosol aminopeptidase is a eukaryotic cytosolic zinc-dependent exopeptidase that catalyzes the removal of unsub- 
stituted amino-acid reskiues from the N-termlnus of proteins. This enzyme is often known as leucine aminopeptkJase 
(EC 3.4.11.1) (LAP) but has been shown [1] to be identfcal with prolyl aminopeptkiase (EC 3.4.11.5). Cytosol ami- 
nopeptidase is a hexamer of kJentical chains, each of which binds two zinc ions. 

Cytosol aminopeptkiase is highly similar to Escherfchia coli pepA. a manganese dependent aminopeptidase. Reskiues 
involved in zinc k)n-binding [2] in the mammalian enzyme are absolutely consented In pepA where they presumably 
bind manganese. 

A cytc^ol aminopeptidase from Rrckettsia prowazekii [3J and one from ArabkJopsis thafiana also belong to this family 
As a signature pattern for these enzymes, a perfectly consented octapeptkJe was selected which contains two reskJues 
involved in binding metal ions: an aspartate and a glutamate. 

- Consensus pattern: N-T-D-A-E-G-R-L [The D and the E are zinc/manganese ligands) 

- Note: these proteins betong to family Ml 7 in the classificatbn of peptktases [4,E11. 

II] Matsushima M.. Takahashi T.. Ichinose M.. MIki K.. Kurokawa K., Takahashi K. Bk)chem. Bk)phys Res Com- 
mun. 178:1459-1464(1991). ' * 

[ 2J Burley S.K., Davki RR., Sweet R.M.. Taybr A.. Lipscomb W.N. J. Mol. Biol. 224:113-140(1992). 
[ 3] Wood D.O., Solomon M.J., Speed R.R. J. Bacteriol. 175:159-165(1993). 
[ 4J Rawlings N.D.. Barrett A. J. Meth. Enzymol. 248:183-228(1995). 

[1188J 437. Assemblin (Peptfclase family 821 ) 
11] 

Medline: 96399137 

Three-dimensional stnjcture of human cytomegalovirus protease. 

Shieh HS. Kurumbail RG Stevens AM, Stegeman RA, Sturman EJ. 

Pak JY. WIttwer AJ. Palmier MO, Wiegand RC, Holwerda BC 

Stallings WC; 
Nature 1996;383:279-282. 
Number of members: 29 

[1189] 438. Pollen proteins Ole e I family signature 

The following plant pollen proteins, whose bbtogical function is not yet known, are structurally related [1]: 

- Olive tree pollen major allergen (Ole e I). 

- Tomato anther-specific protein LAT52. - Maize pollen-specific protein ZmC13 These proteins are most probably 
secreted and consist of about 145 reskiues. As shown in the following schematic represenlatbn. there are six 
cysteines which are consented in the sequence of these proteins. They seem to be involved in disulfide bonds 



xxxxxxCxCxxxxxxxxxCxxxxxxxxxxxxxxxxxCxxxxxCxxxxxxxxxxxxxxxxxxxxCxxxxxxx 
*******C': conserved cysteine involved in a disulfide bond, 

**': position of the pattern. 
Consensus pattern: [EQl-G-x-V-Y-C-D-T-C-R [The two C's are probably involved in disulfide bondsj 
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Number of members: 122 

[11 82] 431 . (Parvo coat) Parvovirus coat protein. 72 members. 
[1183] 432. Pectinesterase signatures 

Pectinesterase (EC 3. 1 . 1 . 1 1 ) (pectin methylesterase) catalyzes the hydrolysis of pectin into pectate and methanol. In 
plants, it plays an important role in cell wall metabolism during fruit ripening. In plant bacterial pathogens such as 
Enwinia carotovora and in fungal pathogens such as Aspergillusniger, pectinesterase is involved in maceration and 
soft-rotting of plant tissue. 

Prokaryotic and eukaryotic pectinesterases share a few regions of sequence similarity [1.2.3). two of these regions 
were selected as signature patterns. 

The first is based on a region In the N-termlnal section of these enzymes; it contains a conserved tyrosine which may 
play a role in the catalytic mechanism (3J. The second pattern corresponds to the best conserved region, an octapeptide 
located in the central part of these enzymes. 

- Consensus pattern: [GSTNPJ.x(6)-[FYVHR]-[IVN]-[KEP]-x-G-[STIVKRQ]-Y-lDNQKRMVl-[^ 

- Consensus pattern: [IV]-x-G-[STAD]-[LIVT]-D-[FYI]-IIV]-(FSN]-G 

[ 1) Ray J., Knapp J.. Grierson D.. Bird C, Schuch W. Eur. J. Biochem. 174:119-124(1988). 

[ 2] Plastow G.S. Mol. Microbiol. 2:247-254(1988). 

[ 3) MarkovicO.. Joernvall H. Protein ScL 1:1288-1292(1992). 

[1 1 84] 433. Pentapeptide repeats (8 copies) 

These repeats are found in many cyanobacterial proteins. 

The repeats were first identified in hglK (1). The function of these repeats is unknown. 
The structure of this repeat has been predicted to be a beta-helix [2]. 

The repeat can be approximately described as A(D/N)LXX. where X can be any amino ackj.Number of members* 75 
[11 

Medline: 96062225 

The hglK gene is required for tocallzation of heterocyst-specific glycolipids in the cyanobacterium 
Anabaena sp. strain PCC 7120. 
Black K. Bulkema WJ, Haselkorn R; 

J Bacteriol 1995;177:6440-6448. 

[2]Medline: 98318059 
Structure and distributton of pentapeptide repeats in bacteria. 
Bateman A. Murzin A, Teichmann SA; 

Protein Scl 1998;7:1477-1480. 

[3]Medline: 98316713 

Characterisation of an Arabidopsis cDNA encoding a thylakoid lumen protein related to a novel •pentapeptide repeat* 
family of proteins. 

Kieselbach T, Mant A. Robinson C, Schroder WP; 
FEBS Lett 1998;428:241-244. 
[1185] 434. Polypeptide deformylase 
[11 

Medline: 97002011 

A new subclass of the zinc metalloproteases superfamily revealed by the solutk)n structure of peptkJe deformylase 
Meinnel T. Blanquet S. Dardel F; 
J Mol Biol 1996;262:375-386. 

(2]Medline: 98332750 
Solution structure of nickel-peptrde deformylase. 
Dardel F, Ragusa S, Lazennec C, Blanquet S. Meinnel T; 

J Mol Biol 1998;280:501-513. 
Number of members: 21 

[1 186) 435. Peptidyl-tRNA hydrolase signatures 

Peptidyl-tRNA hydrolase (EC 3.1.1.29) (PTH) is a bacterial enzyme thai cleaves peptidyl-tRNA or N-acyl-aminoacyl- 
tRNA to yield free peptides or N-acyl-amino acids and tRNA. The natural substrate for this enzyme may be peptidyl- 
tRNA which drop off the ribosome during protein synthesis [1 .2]. Bacterial PTH has been found [2,3] to be evolutionary 
.related to yeast hypothetical protein YHR189w. 

PTH and YHR189w are proteins of about 200 amino acid residues. As signature patterns, two conserved regions were 
selected that each contain an histidine. The first of these regions is located in the N-terminal section, the other in the 
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" J^mt^^ "^^"^ '^^''''^ transporters PTR2.A ar.d PTR2-B (also known as the histidine transporting protein 

• Arabidopsis thaliana proton-dependent nitratefehlorate transporter CHL1 . 

- Lactococcus proton-dependent di- and tri-peptide transporter dtpT 

- Caenortiat>ditis etegans hypothetical protein C06G8.2. 
CaenorhabdHis elegans hypothetical protein F56F4.5. 
Caenorhabditis elegans hypothetical protein K04E7.2. 
Escherichia coli hypothetical protein ybgH. 
Escherichia coli hypotheticat protein ydgR 

- Escherichia coli hypothetical protein yhiP. 

- Escherichia coli hypothetical protein yjdL 
Bacillus subtilis hypothetical protein yclF. 

These integral membrane proteins are predicted to comprise tweVe transmembrane regions As signature oattems 
tv«>ofthebest^sen,edregionswereselected.Thefirstlsaregk^ 

- Consensus pattern: IFYT]-x(2HLMFY14FY>^.[LIVMFYWAhx-[l VG]-N.[LIVMA^^ 

[ IJ Paulsen I.T., Skurray RA. Trends Biochem. Sci. 19:404-404(1994) 
12] Steiner H.-Y.. Naider P.. Becker J.M. ^4ol. Microbiol. 16:825^834(1995). 

[1176J 427. Pumilio-family RNA binding domains (aka PUIVWD, Pumifo homotogy domain) 
Puf domains are necessary and sufficient for sequence specific 

riJil!''"'**']? '".^ ^"""'"'^ ^ "^"^ ^^^'^ ^ ^^^-^^ P^^teins function as translational repressors in eariv 
rJSZcht'c^^^^ r ^"^"^"^^^ ^' ^-9^^ -^^^ (-9. the r,ar.os response "^^^^^^^ 
2a* are^^^ the Po« element (PME) in worm fem-3 mRNA). Other prcHeins that c^Z Pu 

S^rb^HM^^^^^^^^ ''''^ """"^ ^^'^^-Y^-^' -tance. appears to also contain a single RRM 

Puf domains usually occur as a tandem repeat of 8 domains 

IT 6 f r ! "^V"^^t^"»y 8 in all sequences; some sequences appear to have 5 

or 6 domains on in rt«l analysis, but further analysis suggests the presence of additional divergent doSSs 
[1177] (1] Zhang B. Gallegos M, Puoti A. Durkin E. Fields S Kimble J aomams. 

nnr,^ p'JTp H ''''^^ ^""""^^ Jatehmann R. RNA 1997:3:1421-1433 

hi h1 PWWP ^ain. The PWWP domain is named after a consented Pro-Trp-Trp-Pro motif The function of 
the domain is currently unknown. Number of members- 19 K P rro mom. i ne lunction of 

^"^^ n"' "^""^'^ « 90 kb SET domain^taining gene, expressed in early development and 

STS 4^^^^^^ "'^^ V^«-Hirschhom syndrome crrtcal re^on anS^^ 

IgH m t(4.14) multiple myekxna. Stec I. Wright TJ, van Ommen GJB, de Boer PAJ van Haerinoen T^ZZ^n^^ 
Altherr MR, den Dunnen JT; Hum Mol Genet 1998 7' 1071 -1082 Haenngen A. Moomian AFM, 

[1180] 429. PX domain 

Eukaryotic domain of unknown f unctk>n present in phox proteins. PLD isoforms, a PI3K isoform 
Number of members: 71 ^iwini. 

ni 

Medline: 97084820 

Novel domains in NADPH oxidase subunits. sorting nexins, and 
Ptdlns 3-kinases: binding partners of SH3 domains? 
Ponting CP; 

Protein Sci 1996;5:2353-2357. 
[1 181 J 430. ParA family ATPase 
[1] 

Medline: 91141297 

A fami V of ATPases involved in active partitioning of diverse bacterial plasmids 
Motallebi-Veshareh M. Rouch DA. Thomas CM; 
Mol Microbiol 1 990;4: 1 455-1 463. 



EP 1 033 405 A2 



[11 66] 421 . N-(5*phosphoribosyl)anthranrlate (PRA) Isomerase 

[1] Wilmanns M. Priestle JP, Nlermann T» Jansonius JN; 

J Mol Biol 1992;223:477-507. 

[1 167] 422. (PRK) Phosphortoulokiriaso signature 

Phosphoribulokinase {EC 2.7.1.19) (PRK) [1 .2] Is one of the enzymes specific to the Calvin's reductive pentose phos- 
phate cycle which is the major route by which carbon dioxide is assimilated and reduced by autotrophic organisms. 
PRK catalyzes the ATP-dependent phosphorylation of ribulose 5-phosphate into ribulose 1 .5-bisphosphate which ts 
the substrate for RubisCO. PRK's of diverse origins show different properties with respect to the size of the protein, 
the subunit structure, or the enzymatic regulation. However an alignment of the sequences of PRK from plants, algae! 
photosynthetic and chemoautotrophic bacteria shows that there are a few regions of sequence similarity As a signature 
pattern one of these regions was selected. 
[1168] Consensus pattern: K-[LIVM)-x-R-D-x(3)-R-G-x*lST>x-E 

1 1] Kossmann J.. Klintworth R. Bowlen B. Gene 85:247-252(1989). 

[ 2J Gibson J.L. Chen J.-H., Tower P. A., Tabita F.R Biochemistry 29:8085-8093(1990). 

[1169] 423. (PRPP synt) Phosphoribosyl pyrophosphate synthetase signature 

Phosphoribosyl pyrophosphate synthetase (EC 2.7.6.1 ) (PRPP synthetase) catalyzes the formation of PRPP from ATP 
and ribose 5-phosphate. PRPP Is then used in various biosynthetic pathways, as for example in the formation of purines, 
pyrimidines, histidine and tryptophan. PRPP synthetase requires inorganic phosphate and magnesium ions for its 
stability and activity. 

In mammals, three isozymes of PRPP synthetase are found; in yeast there are at least four isozymes. 
As a signature pattern for this enzyme, a very consen/ed region was selected that has been suggested to be involved 
in binding divalent cations [1]. This region contains two conserved aspartic acid residues as well as a histidine. which 
are alt potentral ligands for a cation such as magnesium. 

[1 1 70] Consensus pattern: D-[LI)-H-[S Al-x-Q-[l MST1-[QM]-G-[FY]-F.x(2)-P-[LI VMFCJ-D 

[1171] [ 1) Bower S.G.. Hariow K.W.. Switzer RL, Hoven-Jensen B. J. Biol. Chem. 264:10287-10291 (1989). 

[1172] 424. (PRTP) Herpesvirus processing and transport protein 

The members of this family are associate with capsid intermediates during packaging of the virus. 
Number of members: 31 

11] 

l^edltne: 98362148 

Herpes simplex virus type 1 cleavage and packaging proteins 
UL15 and UL28 are associated with B but not C capsids during 
packaging. Yu D, Welter SK; 

J Virol 1998;72:7428-7439. 
[1173] 425. Photosystem I psaG / psaK (PSI PSAK) proteins signature 

Photosystem I (PSI) [1] is an integral membrane protein complex that uses light energy to mediate electron transfer 
from plastocyanin to f erredoxin. It is found in the chloroplasts of plants and cyanobacteria. PSI is composed of at least 
1 4 different subunils. two of which PSI-G (gene psaG) and PSI-K (gene psaK) are small hydrophobic proteins of about 
7 to 9 Kd and evolutionary related [2]. Both seem to contain two transmembrane regions. Cyanobacteria seem to 
encode only for PSI-K. 

[1174] As a signature pattern, the best-conserved region was selected which seems to correspond to the second 
transmembrane region. 

- Consensus pattern: [GT]-F-x-[LIVMl-x-IDEAl-x{2)-[GA]-x-[GTA]-[SAJ-x.G-H-x-lUVM]-lGAl 
[1] Golbeck J.H. Biochim. Biophys. Acta 895:167-204(1987). 

(2J Kjaerulff S.. Andersen B.. Nielsen V.S.. Moller B.L. Okkels J.S. J. Biol. Chem. 268:18912-18916(1993). 
[1175] 426. PTR2 family proton/oligopeptide symporters signatures 

A family of eukaryotk; and prokaiyotk: proteins that seem to be mainly involved in the intake of small peptides with the 
concomitant uptake of a proton has been recently characterized (1.2]. Proteins that belong to this family are* - Funqal 
peptide transporter PTR2. 

Mammalian intestine proton-dependent oligopeptide transporter PeptTI . 
Mammalian kidney proton-dependent oligopeptide transporter PeptT2. 
Drosophila optl. 
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four PP2C hornotogs: phosphatase PTC1 [2) which has weak tyrosine phosphatase activity In addition to its activitv 
on ^""^^Phatases PTC2 and PTC3. and hypothetical protein YBR125c Isozymes of P "c^e a£ kIw^ 

In Arabidopsis thaliana. the kinase associated protein phosphatase (KAPP) 131 is an enzvma that ri««h~^««.i . 
the Ser/Thr receptor-like kinase RLK5 and which contaL a C^ennii^P2C Jc^ ^ dephosphorylates 

^ '° *^ evolutionaor related to the main family of serine/ threonine phosphatases- PPi PP2A and 

(PpPC) 14]. Which catalyzes dephosphorylatton and concomitant reactivation of the alpha subunit of the Ei ««;d«;S 
of the pyn«ate d^ydroger«se complex. PDPC is a mitochondrial enzyme and. like PP2C. is ma^ e JumnSZZ^ 
AS a signature panem the best conserved region was selected whk^, is located in the merrr^irJ^Za^^a 
perfectly conserved tnpeptKte. This region includes a consented aspartate residue involved in diva^t^rS^g 

[11591 Consensus pattem: (UVMFYHLIVMFYAHGSACHLIVMHFYCH>G-H-[GAV) 

- IMote: PP2C belongs [6] to a superfamiy vi^hich also includes bacterial proteins such as Baatlus soollE ,shi i a„n 
rsbW. Synechocystis PCC 6803 clG as well as a domain in fungal adenylate cyctesi ^ ' 

f ^1] Wenk J.. Trompeter H.-l.. Pettrich K.-G.. Cohen P.T.W.. Campbell D.G.. Mieskes G. FEBS Lett. 297:1 35-138 

[ 2] Maeda T. Tsai A.Y.M.. Saito H. Mol Cell. Bfol. 13:5408-5417(1993) 

1 3J Stone J.M ColHnge M.A.. Smith RD.. Horn M.A. Walker J.C. Science 266:793-795(1994) 

^ - Yan J.. Reed L.J. Biochemistry ^8987-8993(1993) 

[ 5] Das AK.. Helps N R.. Cohen RT.W.. Barford D. EMBO J. 24:6798-6809(1996) 
1 6] Bork R, Brown N.P.. Hegyi H.. Schultz J. Protein Sci. 5:1421-1425(1996). 

[1 1 60] 419. (PPTA) Protein prenyltransf erases alpha subunit repeat signature 

^ ^tZir^"^*'-^^^ T*^^ "^^^ °' ^ '^P'^y -^'^ty to « four residues from the C4em,inus 

ct several proteins. They are heterodlmeric enzymes consisting of alpha and beta subunits. The alpte ^ubunftfeZrc^t 
to participate in a stable complex wrth the isoprenyl substrate; the beta subunit binds the peptrsuSSe S 
proteBi prenyllransferases might share a common alpha subunit. Both the alpha and b^s^miT^:.^!^. 
s^uence motifs {1 ,. These repeats have distinct stntCural and functional implSSnrlTe unJel^ eS tolTo her 
Knovwi protein prenyltransferase alpha subunits are: unreiaieo lo eacn other. 

• Mammalian protein famesyltransferase alpha subunit. 

- Yeast protein RAM2. a protein famesyltransferase alpha subunit. 

- Yeast protein BET4. a protein geranylgeranyltransferase alpha subunit 

The conserved domain of the alpha subunit consists of about 34 amino acids and is reoeated fi«« timo. i. . 
diZslTL'rr'K heterod^erizatcn wHh the olZ^ Z^^ZT.V ^Z 

[1162] (IJBoguski M S.. Murray A.W.. Powers S. New Biol. 4:408-411(1992) 
[1163] 420. (PR55) Protein phosphatase 2A regulatory subunit PR55 signatures 

Protein phosphatase 2A (PP2A) is a serinerthreonine phosphatase involved in many aspects of cellular luncficn in 
clud»,g the regulation of metabofc enzymes and proteins involved in signal transducti^rSL is f triirel^^ 

slTlt °^ ' T " ^•"'y"= ^""""^ ^^^'^'^ - 65 Kd regul^^ subu^ (pS) a^ 

subunit A: this complex then associates with a third variable subunit (subunit 8) which amlerrrii^i^i'Z^ f 

m:mlTE^L2^ °' '^"^'^ ^ KdtlnV^) SrSJ 'oZZ n° 

mammals - where three isoforms are known to exist Drosophila and veast taene CDr «;«;\ Thic o. .k . , "^^ ^^ 

[1164] Consensus pattem: E-F-D-Y-L-K-S-L-E-UE-E-K-I-N 

Consensus pattern: N-(AG]-H-rrAJ-Y-H.|.N-S-l-S4UVM)-N-S-D 

11165] [ ij Mayer-Jaekel R., Hemmings BA Trends Cell Biol. 4:287-291(1994) 



■ICO 
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A duplicated catalytic motif in a new supertamily of phosphohydrolases and phospholipid synthases that includes pox- 
virus envelope proteins. 
Koonin EV; 

Trends Blochem Sci 1 996;21 :242-243. 

[3]Medline: 94327597 

Cloning and expression of phosphatidylchollne-hydrolyzing phospholipase D from Ricinus communis L 
Wang X Xu L, Zheng L; 
J Biol Chem 1994;269:20312-20317. 
[4]Medline: 97386825 

Regulation of eukaryotic phosphattdylinositoi-specfTic phospholipase C and phospholipase D. 
Singer WD, Brown HA. Stemweis PC; 

Annu Rev Biochem 1997;66:475-509. 

[1154] 416. (PMI typel) Phosphomannose Isomerase type I signatures 

Phosphomannose isomerase (EC 5.3.1.8) (PMI) [1.2J is the enzyme that catalyzes the interconversion of mannose- 
6-phosphate and fructose-6-phosphate. In eukaryoles» it is involved in the synthesis of GDP-mannose which is a con- 
stituent of N- and O-linked glycans as well as GPI anchors. In prokaryotes, it is involved in a variety of pathways 
including capsular polysaccharide biosynthesis and D-rDannose metabolism. 

Three classes of PMI have been defined on the basis of sequence similarities [1 J. The first class comprises all known 
eukaryotic PMI as well as the enzyme encoded by the manA gene in enterobacteria such as Escherichia coli. Class I 
PMI's are proteins of about 42 to 50 Kd which bind a zinc ion essential for their activity. 

As signature patterns for class I PMI, two conserved regions were selected. The first one is located in the N-terminal 
sectbn of these proteins, the second In the C-temr>inal half. Both patterns contain a residue involved [3] In the binding 
of the zinc ion. 

[1 1 55] Consensus pattern: Y-x-D-x-N-H-K-P-E [E is a zinc ligand] 

- Consensus pattern: H-A.Y-[UVM]-x-G-x(2)-IU VM]-E-x-M-A-x-S-D-N-x-IUVM]-R-A-G-x-T-P-K [H is a zinc ligand] 

[ 1] Proudfoot A.EJ., TurcattI G., Wells TN.C. Payton M.A.» Smith D.J. Eur J. Biochem. 219:415-423(1994). 
1 2) Coulin R, Magnenat E., Proudfoot A.E.I., Payton M.A., Scully R, Wells TN.C. Biochemistry 32" 141 39-1 41 44 

(1993). 

[ 3] Cleasby A.. Wonacott A., Skarzynski T, Hubbard R.E., Davies G.J., Proudfoot A.E.I.. Bernard A.R. Payton . 
M.A., Wells TN.C. Nat. Struct. Biol. 3:470-479(1996). 

[1156] 417. (PNP UDP 1) Purine and other phosphorylases family 1 signature 
The following phosphorylases belongs to the same family: 

• Purine nucleoside phosphoiy lase (EC 2.4.2. 1 ) (PNP) from most bacteria (gene deoD). This enzyme catalyzes the 
cleavage of guanosine or inosine to respective bases and sugar-1 -phosphate molecules [1]. 

- Uridine phosphorylase (EC 2.4.2.3) (UdRPase) from bacteria (gene udp) and mammals. Catalyzes the cleavage 
off uridine into uracil and ribose-1 -phosphate. The products of the reaction are used either as carbon and energy 
sources or in the rescue of pyrimidine bases for nucleotide synthesis (2J. 

- 5'-methytthioadenosine phosphorylase (EC 2.4.2.28) (MTA phosphorylase) from Sulfotobus solfataricus [3]. 

As a signature pattern, a conserved region was selected in the central part of these enzymes. 
[1157] Consensus pattem: {GSTi-x-G-ILIVM]-G-x-[PA]-S-x-(GSTA]-l-x(3)-E-L 

- Note: it shoudi be noted that mammalian and some bacterial PNP as well as eukaryotc MTA phosphorylase belong 
to a different family of phosphorylases (see <PDOC00954>). 

|1] Takehara M., Ung R. Izawa S., Inoue Y.. Kimura A. Biosci. Biotechnol. Biochem. 59:1987-1990(1995). 

[ 2) Watanabe S.-l., Hino A., Wada K.. Eliason J.R. Uchida T J. Biol. Chem. 270:12191-12196(1995). 

[ 31 Cacciapuoti G., Porcelli M., BertoWo C, De Rosa M.. Zappia V J. Biol. Chem. 269:24762-24769(1994). 

[1158] 418. (PP2C) Protein phosphatase 2C signature 

Protein phosphatase 2C (PP2C) is one erf the four major classes of mammalian serineAhreonine specific protein phos- 
phatases (EC 3.1.3.16). PP2C (1) is a nrx)nomeric enzyme of about 42 Kd which shows broad substrate specificity and 
is dependent on divalent catbns (mainly manganese and magnesium) for its activity. Its exact physiological role is still 
unclear. Three isozymes are currently known in mammals: PP2C-alpha, -beta and -gamma. In yeast, there are at least 



EP 1 033 405 A2 



( 3] Riee S.G.. Choi K.D. J. Biol. Chera 267:12393-12396(1992). 

[ 4J Stemweis P.O., Smrcka A.V. Trends Biochem. Sci. 17:502-506(1 992). 

[1149] 413. (PI-PLC-Y) Phosphatidylinositokspecific phospholipase C profiles 

PhosphatidyIirK>sitol>specific phospholipase C (EC 3.1.4.11), an eukaryotic intracellular enzyme, plays an important 
role in signal transduction processes [1 J. It catalyzes the hydrolysis of 1 -phosphatidyI-D-myo.inositol-3.4.5.triphosphate 
into the second messenger molecules diacylglycerol and inositoH.4,5.trlphosphate. This catalytic process is tightly 
regulated by reversible phosphorylation and binding of regulatory proteins [2 to 4J. 

In mammals, there are at least 6 different isoforms of PI-PLC. they differ in their domain structure, their regulation and 
their tissue distribution. Lower eukaryotes also possess multiple isofomis of PI-PLC. 

All eukaryotic PI-PLCs contain two regions of homology, sometimes referred to as •X-box' and Tbox* The order of 
these two regions is always the same (NH2.X-Y.COOH). but the spacing is variable. In most isoforms, the distance 
between these two regions is only 50-100 residues but m the gamma isoforms one PH domain, two SH2 domains and 
one SH3 domain are inserted between the two PLC-specific domains. The two consented regions have been shown 
to be important for the catalytic activity. At the C-temiinal of the Y-box. there is a C2 domain (see <PCX)C00380>) 
possibly involved in Ca-dependent membrane attachment. 

Profile analysis shows that sequences with significant similarity to the X*ox domain occur also kt prokaryotic and 
trypanosome Pl-specific phospholipases C. Apart from this region, the prokaiyotic enzymes show no similarity to their 
eukaryotic counterparts. ' 

Two profiles were developed, one covering the X-box, the other the Y-box. 

[ 1] Meldrum E., Parker RJ., Carozzi A. Biochim. Biophys. Acta 1092:49-71(1 991 ).( 2) Rhee S G Choi K D Adv 

Second Messenger Phosphoprotein Res. 26:35-61 (1 992). 

[ 3J Rhee S.G.. Choi K.D. J. Biol. Chem. 267:12393-12396(1992). 

1 4] Stemweis P.C.. Snvcka A.V. Trends Biochem. Sci. 17:502-506(1992). 

[1150J 414. (PK) Pyruvate kinase active site signature 

Pyruvate kinase (EC 2 7.1.40) (PK) [1] catalyzes the final step in glycolysis, the converskxi of phosphoenolpymvate 
to pyruvate with tf^e concomitant phosphorylation of ADP to ATP. PK requires both magnesium and potassium ions for 
Its activrty. PK is found in all living organisms. In vertebrates there are four, tissues specific, isozymes: L (liver) R (red 
cells). Ml (muscle, heart, and brain), and M2 (early fetal tissues). In Escherichia coli there are two isozymes PK-I 
(gene pykF) and PK-II (gene pykA). All PK isozymes seem to be tetramers of kJentical subunits of about 500 amino 
uCio resioues. 

As a signature pattern for PK a conserved region was selected that includes a lysine residue which seems to be the 
acidise catalyst responsible for the interconversion of pyruvate and enolpyruvate. and a glutamic acid residue im- 
plicated in the binding of the magnesium ion. 

PISIJ Consensus pattern: lUVACl-x^UVMJ(2)^SAPCV]-K-[LIV)-E-lNKRS•^-x-lDEQHSHGSTAHLIV^4) [K is the 
active site residue] [E IS a magnesium ligand) ji ji jir.«u« 

[1152] 1 1]MuirheadH. Biochem. Soc. Trans. 18:193-196(1990). 
[IISSJ 41 5. (PLDc) Phospholipase D. Active site motif 

Ph^phatidylcholine-hydrolyzing phospholipase D (PLD) isofomis are activated by ADP^ibosylalion factors (ARFs) 
PLD produces phosphatidic acid from phosphatidylcholine, which nr<ay be essential for the formation of certain types 
of transport vesicles or may be constitutive vesicular transport to signal transduction pathways 

PC^hydrolyzing PLD is a homologue of cardiolipin synthase, phosphatidylserine synthase, bacterial PLDs. and viral 
pi oisins. 

Each of these appears to possess a domain duplicatbn which is apparent by the presence of two motifs containing 
well-conseived histid.ne. lysine, and/or asparagine residues which may contribute to the active site, aspartic acid An 
^coli endonuclease (nuc) and similar proteins appear to be PLD homotogues but possess only one of these motifs 
The profile contained here represents only the putative active site regions, since an accurate multiple alignment of the 
repeat units has not been achieved. r w 

Number of members: 139 
[1] 

Medline: 96303814 

A novel family of phospholipase D homologues that includes phospholipkJ synthases and putative endonucleases 
Identification of duplicated repeats and potential active site residues 
Ponting CP, Kerr ID; 

Protein Sci 1996;5:914-922. 

[2]Medline: 96334293 
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Insulin Receptor Substrate 1 (IRS-1). 

- Regulators of snnall G-proteins like guanine nucleotide releasing factor GNRP (Ras-GRF) (which conlalns 2 PH 
domains), guanine nucleotide exchange proteins like vav. dbl, SoS and yeast CDC24. GTPase activating proteins 
like rasGAP and BEM2/IPL2, and the human break point cluster protein bcr. 

- Cytoskeletal proteins such as dynamin (see <PDOC00362>), Caenorhabditis elegans kinesin-like protein unc-104 
(see <PDOC00343>), spectrin beta-chain, syntrophin (2 PH domains) and yeast nuclear migration protein NUM1. 

- Mammalian phosphatidylinosilol-specific phospholipase C (PI-PLC) (see <PDCX:50007>) isoforms gamma and 
delta. Isoform gamnr^ contains two PH domains, the second one is split into two parts separated by about 400 
residues. - Qxysterol binding proteins OSBP, yeast OSH1 and YHR073w. 

- Mouse protein citron, a putative rhc/rac effector that binds to the GTP-bound forms of rho and rac, 

- Several yeast proteins involved in cell cycle regulation and bud formation like BEM2. BEM3, BUD4 and the 
BEM1 -binding proteins BOI2 (BEB1) and BOI1 (BOB1). - Caenorhabditis elegans protein MIG-10. 

- Caenorhabditis elegans hypothetical proteins C04D8. 1 . K06H7.4 and ZK632. 1 2. 
Yeast hypothetical proteins YBR1 29c and YHR1 55w. 

The profile for the PH domain, which has been devetoped by Toby Gibson at the EMBL, covers the total length of 
domain. Several proteins contain large insertions in the PH domain and are thus difficult to detect with this profile. In 
some of these cases, the profile will align only to one half of the PH domain. , 

- Sequences known to belong to this class detected by the pattern: ALL But it should be noted that while all se- 
quences containing PH domains are detected, not alt PH domains are. Some of the split domains lie below the 
cutoff threshokj. 

^ [ 1] Mayer B.J.. Ren R., Clark K.L, Baltimore D. Cell 73:629-630(1993). 
[ 2) Hasiam R.J., Koide H.B., Hemmrngs B.A. Nature 363:309-310(1993). 

( 3] Musacchb A., Gibson T.J.. Rice P., Thompson J., Saraste M. Trends Biochem. Sci. 18:343-348(1993). 
[4J Gibson T.J., Hyvonen M., Musacchio A., Saraste M.. Bimey E. Trends Biochem. Sci. 19:349-353(1994) [ 5) 
Pawson T Nature 373:573-580(1 995). [ 6) Ingley E.. Hemmings BA J. Cell. Biochem. 56:436-443(1994).! 7) Sar- 
aste M., Hyvonen M. Curr. Opin. Struct. Biol. 5:403-408(1 995). [ 8] Riddihough G. Nat Struct Biol 1 755-757 
(1994). 

411. PHD-finger 
[1] 

Medline: 95216093 

The PHD finger implications for chromatin -mediated transcriptional regulation. 
Aasland R, Gibson TJ, Stewart AF; 

Trends Biochem Sci 1995;20:56-59. 
Number of members: 181 

[1148] 412. (PI-PLC-X) PhosphatkJyIinositol-specific phospholipase 0 profiles Phosphatidylinositol-specific phos- 
pholipase G (EC 3.1 .4.11), an eukaryotfc intracellular enzyme, plays an important role in signal transductbn processes 
[1). It catalyzes the hydrolysis of 1-phosphatidyl-D-myo-rnositol-3.4,5-triphosphate into the second messenger mole- 
cules diacylglycerol and inositol-1 ,4.5-triphosphate. This catalytic process is tightly regulated by reversible phosphor- 
ylation and binding of regulatory proteins |2 to 4). 

In mamnnals, there are at least 6 different isoforms of PI-PLC, they differ in their domain structure, their regulatten. and 
their tissue distributton. Lower eukaryotes also possess multiple isoforms of PI-PLC. 

All eukaryotic R-PLCs contain two regions of homology, sometimes referred to as 'X-box' and 'Y-box'. The order of 
these two regions is always the same (NH2-X-Y-COOH). but the spacing is variable. In most isoforms, the distance 
between these two regions is only 50-100 residues but in the gamma isoforms one PH domain, two SH2 domains, and 
one SH3 domain are inserted between the two PLC-specific domains. The two conserved regions have been shown 
to be important for the catalytk: activity. At the C-terminal of the Y-box, there is aC2 domain (see <PDOC00380>) 
possibly involved in Ca-dependent membrane attachment. 

Profile analysis shows that sequences with significant similarity to the X-box domain occur also in prokaryotic and 
trypanosome Pl-specific phospholipases C. Apart from this region, the prokaryotic enzymes show no similarity to their 
eukaryotic counterparts. 

Two profiles were developed, one covering the X-box. the other the Y-box. 

[ 1] Meldrum E., Parker RJ.. Carozzi A. Biochim. Biophys. Acta 1092:49-71(1 991 ).( 2) Rhee S.G., Choi K.D. Adv. 
Second Messenger Phosphoprotein Res. 26:35-61(1992). 
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[1144] [ 1] Watson H.C., UttlechiW JA Biochem. Soc. Trans. 18:187-190(1990). 

[1145J 409. (PGM PMM) Phosphoglucomutase and phosphomannomutase phosphoserine signature 

. Phosphoglucomutase (EC 5.4.Z2) (PGM). PGM is an enzyme responsible for the conversion of D^Iucose 1 ^>hos. 
phate into D^lucose 6-phosphate. PGM participates in both the breakdown and synthesis of glucose f 1 1 

- Phosphomannomutase (EC 5.4.2.8) (PMM). PMM is an enzyme responsible for the conversion of C>rnannose 
l-phosphate into D-mannose 6-phosphate. PMM is required for different biosynthetic pathways in bacteria For 
example, m enterobacteria such as Escherichiacoli there are two different genes coding for this enzyme- rfbK 
which IS involved in the synthesis of the O antigen of lipopolysaccharide and cpsG whic^ 

of the M antigen capsular polysaccharide [2]. In Pseudomonas aeruginosa PMM (gene algC) is involved in the 
biosynmesis of me a^^^^^^ layer [3J and in Xanthomonas campestris (gene xanA) it is involved in the biosynthesis 
of xanthan H). In Rhizobium sUain ngr234 (gene noeK) it is involved in the biosynthesis of the nod factor 
. Ph^hoacetylglucosamine mutase (EC 5.4.2.3) whk:h converts N-acetyl-Ogiucosamhe l-phosphate into the 
6-phosphate isomer. ^ 

The catalytic mechanism of both PGM and PMM involves the formatton of a phosphoserine intermediate 111 The 

sequence around the serine reskiue is well consented and can be used as a signature pattern 

In addition to PGM and PMM there are at least three uncharacterized proteins that bekxig to this family 15.6]: 

- Urease operon protein ureC from Helicobacter pylori. 
Escherichia coli protein mrsA. 

- Paramecium tetraurelia parafusin, a phosphoglycoprotein involved In exocytosis 

- A Methanococcus vannielii hypothetical protein in the 3Vegk)n of the gene for ribosomal protein S10. 

111461 Consensus pattern: [GSAHUVMJ.x4LIVM].[SWGA]-S-H-x.P-x{4H^ [S is the phosphoserine resi- 

- Note: PMM from fungi do not betong to this family. 

[ IJ Dai J.B., Liu Y, Ray W.J. Jr.. Konno M. J. Bfol. Chem. 267:6322-6337(1992). 

( 2] Stevenson G.. Lee S.J.. Romana LK., Reeves PR. Mol. Gen. Genet 227:173-180(1991) 

[ 3] Ziehnski N.A. Chakrabarty AM.. Beny A. J. Biol. Chem. 266:9754-9763(1991) 

[ 4J Koeplin R.. AmoW W.. Hoette B;. Simon R., Vfeng G.. Puehler A. J. Bacterbl. 174-191-199(1992) 

[ 5] Bairoch A. Unpublished observations (1 993). 

[ 6] Subramanian S.V.. Wyroba E.. Andersen AP. Satir B.K Proc. Natl. Acad. Set U.S.A 91:9832-9836(1994). 
[1147] 410. PH dontain profile 

The 'pleckstrin horriotogy (PH) domain is a domain of about 100 residues that occurs in a wide range of proteins 
involved m intracellular signaling or as constituents of the cytoskeleton (1 to 7] ^ 

s'lb'un^^^^^^ ^^^^"^ "^"^^^^ '^^^^^ -^gasted: - biding to the beta/gamma 

- binding to lipids, e.g. phosphatidylinositoM.S-bisphosphate, 

- binding to phosphorylated Sei/Thr residues, 

- attachment to niembranes by an unknown mechanism. 

It is possible that different PH domains have totally different ligand requirements 

The 3D structure of several PH domains has been delemriined (8J. A« knovm cases have a common structure consistir^g 
. t^TTT^' an^^parallel beta sheete. followed by a C-terminal amphipathfc helbc. The loops conne^ng the 

^:r^hrth%?srr:r ""^'^ ^^^'^^^ - - 

Proteins reported to contain one more PH domains belong to the following families: 

' Pj^l^strin. the protein where this domain was first detected, is the major substrate of protein kinase C in platelets 
Pleckstrin is one of the rare proteins to contains two PH domains. 

■ f ^^^^^^ ^"""^ ^® ^^"^ beta-adrenergic receptor kinases, the mu isofom^ of PKC 

and the trypanosomal NrkA family. « '"u isoiorrn oi f^isu 

- Tyrosine protein kinases belonging to the Btk/ltk/Tec subfamily. 
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of phosphoglycerate ll,2J. Both enzymes can catalyze three different reactions, although in different proportions: 

- The isomerization of 2-phosphoglycerate (2-PGA) to 3-phosphoglycerate (3-PGA) with 2.3<fiphosphogIycerate 
(2,3-OPG) as the primer of the reaction. 

- The synthesis of 2,3-DPG from 1 ,3-DPG with 3-PGA as a primer 

- The degradation of 2.3-DPG to 3-PGA (phosphatase EC 3. 1 .3. 1 3 activity). 

In mammals, PGAM Is a dtmeric protein. There are two isoforms of PGAM: the M (muscle) and B (brain) forms. In 
yeast, PGAM is a tetrameric protein. BPGM is a dimeric protein and is found mainly in erythrocytes where it plays a 
major role in regulating hemoglobin oxygen affinity as a consequence of controlling 2,3-DPG concentration. 
The catalytic mechanism of both PGAM and BPGM involves the formation of a phosphohistldine intermediate [3). 
The bifunctional enzyme 6-phosphofructo-2-kinase / fructose-2,6-bisphosphatase (EC 2.7.1.105 and EC 3.1.3.46) 
(PF2K) [4] catalyzes both the synthesis and the degradation of fructose-2.6-bisphosphate. PF2K is an important en- 
zyme in the regulation of hepatic carbohydrate metabolism. Like PGAM/BPGM. the fructose-2,6-bisphosphatase re- 
action involves a phosphohistldine intermediate and the phosphatase domain of PF2K is structurally related to PGAM/ 
BPGM. 

The bacterial enzyme alpha-ribazole-5'-phosphate phosphatase (gene cobC) which is involved in cobalamin biosyn- 
thesis also betongs to this family [5]. 

A signature pattern was built around the phosphohistldine residue. 

[1140J Consensus pattern: lUVM]-x-R-H-G-[EQ]-x(3)-N (H is the phosphohistkjine residue] 

- Note: some organisms harbor a forni of PGAM independent of 2.3-DPG. this enzyme is not related to the family 
described above [6]. 

[ 1] Le Bouteh R, Joulin V.. Garel M.-C.. Ftosa J.. Cohen-Solal M. Bk>chem. Biophys. Res. Commun 156-874-881 
(1988). 

[ 2) White M.F.. Fothergill-Gilrrore LA. FEBS Lett. 229:383-387(1 g88). 
[ 3) Rose Z.B. Meth. Enzymol. 87:43-51(1982). 

( 4J Bazan J.R. Fletterick R.J., Pilkis S.J. Proc. Natl. Acad. Sci. U.S.A. 86:9642-9846(1989). 

I 5] OToole G.A., Trzebiatowski J.R., Escalante-Semerena J.C. J. Biol. Chem. 269:26503-26511(1994). 

[ 6| Grana X.. De Lecea L.. El-Maghrabi M R., Urena J.M., Caellas C, Carreras J., Puigdomenech R, Pilkis S J 

Climent F. J. Biol. Chem. 267:12797-12803(1992). 

[1141] 407. (PGI) Phosphoglucose isomerase signatures 

Phosphoglucose isomerase (EC 5.3. 1 .9) (PGI) [1 .2] is a dimeric enzyme that catalyzes the reversible isomerization of 

glucose-6-phosphate and fructose-6-phosphate. PGI is involved in different pathways: in most higher organisms it is 

involved in glycolysis; in mammals it is involved in gluconeogenesis; in plants in carbohydrate biosynthesis; in some 

bacteria it provides a gateway for fructose Into the Entner-Doudouroff pathway PGI has been shown [3] to be Wentical 

to neuroleukin, a neurotrophic factor whfch supports the survival of various types of neurons. 

The sequence of PGI from many species ranging from bacteria to mammals is available and has been shown to be 

highly conserved. As signature patterns for this enzyme two conserved regions were selected, the first region is located 

in the central section of PGI, while the second one is located in its C-terminal section. 

[1 1 42] Consensus pattern: (DENSl-x-JLI VM]-G-G-R-[FY1-S-[U VMT]-x-|STAJ-(PS AC]-[LI VMA}-G 

- Consensus pattern: IGS]-x-(LIVMJ-{LIVMFYWl-x(4)-(FYl-[DN)0-x-G-V-E-x(2)-K 

J 1] Achari A.. Marshall S.E., Muirhewad H.. Palmieri R.H.. Noltmann E.A Philos. Trans. R. Soc Lond B Biol 
Sci. 293:145-157(1981). 

[ 2) Smith M.W., Doolittle R.F J. Mol. Evol. 34:544-545(1992). 

I 3] Faik P. Walker J.I.H.. Redmill A.A.M., Morgan M.J. Nature 332:455-456(1988). 

[1143] 408. (PGK) Phosphoglycerate kinase signature 

Phosphoglycerate kinase (EC 2.7.2.3) (PGK) [1) catalyzes the second step in the second phase of glycolysis, the 
reversible conversion of 1 ,3-diphosphoglycerate to 3-phosphoglycerate with generation of one molecule of ATP. PGK 
is found in all living organisms and its sequence has been highly consented throughout evoluton. It is a two<lomain 
protein; each domain is composed of six repeats of an alpha/beta structural motif. As a signature pattern for PGK's, a 
conserved region in the N-terminat regkxi was selected. 
Consensus pattern: lKRHGTCVNHVTHLIVMF]-[LIVMC]-R-x-D-x-N-lSACV]-P 
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region was selected that contains four acidic residues and which is located in the central part of the enzyme The 

o?!^"^,"'f ^ ♦° ^ C-terminus of an ATP-bindinrmotif 'A' (P^Wsm 

<PDOC00D17>) and is also part of the ATP-binding domain (2]. oui m (k loop) (see 

1113(0 Ckwsensus pattern: L-l-6-D-D-E-H-x-W-x-{DE]-x-G-{iyj-x-N 

- Note: phc^phoenolpyruvate carboxykinase (GTP) (EC 4.1.1.32) an enzyme that catalyzes the same reaction but 
using GTP instead of ATP, is not related to the above enzyme (see <PDOC00421 >). 

1 1J Medina V.. Pontardta R, Glaeske D.. Tabel H.. Goldie H. J. Bacteriol. 172:7151-7156(1990) 
1 2J Matte A.. Goldie H., Sweet RM.. Oelbaere LT.J. J. MoL Biol. 256:126-143(1 996). 

^I^i^lwoci^^"''^ Phosphoenolpynivate carboxylase active sites. Phosphoenolpyruvate carboxylase (EG 
Ji^tl ISSr^f ^ irreversible betan^boxylation of phosphoenolpyruvate by bicarbonate to y eld 
TS^^l^ phosphate. The enzyme is found in all plants and In a variety of microorganisms. A histidine fl J and 
LhI J ^ TkJ^" "^"^'^ rnect^lsrr^ of this enzyme; the regions around these actile We 

residues are highy conserved in PEPcase from various plants, bacteria and cyanobacteria and can be used as a 
signature patterns for this type of enzyme. « ««na can oe usea as a 

[1132J Consensus pattern: lVT>x-T-A-H-P-T-{EQhx(2)-FKKRHJ [H is an active site residuel- 
Consensus pattern: llVI-M-[LIVMhG-Y-S-D-S-x-K-CHSTAG]-G [K is an active site residuel- 

KlfAnlJi'^^TJ^h""' p '-^T- 797-803(1 991).[2J Jiao J.-A.. Podesta F.E.. Chollet R, aieary 
M.H., Andreo C.S. Biochim. Biophys. Acta 1041:291-295(1990), 

[1134] 404. PET1 12 family signature 

The following proteins from eukaryoles. prokaryotes and archaebacteria belong to the same family: 

• IX^^el^Z^^'' "'^^ " ""--"^ ^ -^-►'-^'al genes. 

- Aspergillus nidulans mitochondrial protein nempA. 
Bacillus subtllis hypothetical protein yzdD. 

- f^oraxella catarrhal hypothetical protein in bloR-1 S'region. 
■ Mycoplasma genitalium hypothetical protein MG100. 

- Methanococcus jannaschii hypothetical proteins MJ001 9 and MJ01 60. 

t^r/litla^rrsTStr ' " ^'"'"^ ^"^^ ' ^"^^ ^ --'^"^ -te^ 

[1 1 35J Consensus pattern: [DN]-x-(DNJ-R-x(3)-P-L-[LIVJ-E-(UV)-x-{ST]-x-P 
[1136] [ 1] Mulero J.J., Rosenthal J.K.. Fox TD. Curr. Genet. 25:299-304(1994) 
[1 1 37] 405. (PFK) Phosphof ructokinaso signature 

Phosphofructokinase (EC 2.7.1.11) (PFK) {1 ,2) is a key regulatory enzyme In the glycolytic pathway It catalyzes the 
pho^ho|ylato,byATPoffn,ctose6-phosphatetofructose1.5-bi^^ 

Which are highly related to the bacterial 36 Kd subunits. In Human there are three, tissue^pecific tyS^pSJ^^ 
zymes: PFKM (mu«:le). PFKL (IKrer). and PFKP (platelet). In yeast PFK is an octanUr comp^SS ^"S> ZlZ 
chains (gene PFKl) and four 100 Kd beta chains (gene PFK2); like the mammalian 80 Kd s^units Z yeast IW Kd 
subunits are composed of two homotogous domains. =>uuunus. me yeasi i oo Kd 

!lVSeT '"^ ''""'^ ^ '"'^^ '^"'^ '^^"^"^^ '"^"'"^ *" Jructose^phosphate binding 

!^S-P ?nl7"' "^"'""^ lRK]-x(4)-G-H-x^QRJ-G-G-x(5)-D-R Fhe R/K. the H and the CVR are invohred in fruc- 

* rmi!lr^'S!S« "* '««.»«°P^ho'"'ctokinase isozymes which are encoded by genes pfkA (major) and pfkB 
(minor). The pf kB «02yme is not evolutionary related to other prokaryotic or eukaryotic PFK's (see <PDOC005(S>" 

1 1} Poonman R.A.. Randolph A., Kemp R.G.. Helnrikson R.L Nature 309 467^69(1 984) 

1 2) Heinisch J.. Ritzel RG.. von Borstel RC. Aguilera A. Rod«k) R.. Zimmemiann F.K. Gene 78:309-321(1989). 

[11 39] 406. (PGAM) Phosphoglycerate mutase family phosphohistkJIne signature 

PtKwphoglycerate mutase (EC 5.4.2. 1 ) (PGAM) and bisphosphoglycerate mutase (EC 5.4.2 4) (BPGM) are structurallv 
related enzymes which catalyze reactions Evolving the t«nsfer of phospho groups betw Jme Sr^^ZlS 
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A., Burgess P.M. J. Nucleic Acids Res. 18:261-265(1990). 

[ 4) O'Reilly D.R., Crawford A.M., Miller LK. Nature 337:606-606(1989). 

[1121] 399. {PUT) Prephenate dehydratase signatures 

Prephenate dehydratase (EC 4.2.1.51) (PDT) catalyzes the decarboxylation of prephenate into phenylpymvate. In 
microorganisms PDT is involved in the terminal pathway of the biosynthesis of phenylalanine. In some bacteria such 
as Escherichia coli PDT is part of a bifunctional enzyme (P-protein) that also catalyzes the transformation of chorismate 
into prephenate (chorismate mutase) while in other bacteria it is a monolunctional enzyme. The sequence of mono- 
functional PDT align welt with the C-terminal part of that of P-proteins [1]. 

As signature patterns for PDT two conserved regions were selected. The first region contains a conserved threonine 
which has been said to be essential for the activity of the enzyme in E. coli. The second region includes a conserved 
glutamate. Both regions are in the C-terminal part of PDT. 

[11 22J Consensus pattern: [FYhx-[LI VM)-x(2)-[LI VM)-x(5)-{DN)-x(5)-T-R-F-[LI VMWJ-x-IU VMJ 
[1123] [ 1] Fischer R.S., Zhao 6., Jensen R.A. J. Gen. Microbiol. 137:1293-1301(1991). 
[1124] 400. PDZ domain (Also known as DHR or GLGF). 
[1125] PDZ domains are found in diverse signaling proteins. 
[1126] [1] Ponting CP, Phillips C, Davies KE, Blake DJ 

Bk>essays 1997;1 9:469-479. (2J Doyle DA. Lee A. Lewis J. Kim E. Sheng M. MacKinnon R; Cell. 1996;85:1067-1076 

[3] Ponting CP; Protein Sci 1997;6:464-468. 

[1127] 401 . (PPDK^N Jerm) PEP-utilizing enzymes signatures 

A number of enzymes that catalyze the transfer of a phosphory I group from phosphoenolpyruvate (PEP) via a phospho- 
histkJine intermediate have been shown to be structurally related (1,2,3,4). These enzymes are: 

- Pyruvate.orthophosphate dikinase (EC 2.7.9. 1 ) (PPDK). PPDK catalyzes the reversible phosphorylatk^n of pyru- 
vate and phosphate by ATP to PEP and diphosphate. In plants PPDK function in the direction of the formatk>n of 
PEP. which is the primary acceptor of carbon dnxide in C4 and crassutacean acid metabolism plants. In some 
bacteria, such as Bacteroides symbiosus, PPDK functk)ns in the direction of ATP synthesis. 

- Phosphoenolpyruvate synthase (EC 2.7.9.2) (pyruvate.water dikinase). This enzyme catalyzes the reversible 
phosphorylation of pyruvate by ATP tofomn PEP. AMP and phosphate, an essential step in gluconeogenesis when 
pyruvate and lactate are used as a carbon source. 

- Phosphoenolpyruvate-protein phosphotransferase (EC 2.7.3.9). This is the first enzyme of the phosphoenolpyru- 
vate-dependent sugar phosphotransferase system (PTS). a major carbohydrate transport system in bacteria. The 
PTS catalyzes the phosphorylation of incoming sugar substrates concomitant with their translocatbn across the 
cell membrane. The general mechanism of the PTS is the following: a phosphoryl group from PEP is transferred 
to enzyme-l (El) of PTS which in tum transfers it to a phosphoryl carrier protein (HPr). Phospho-HPr then transfers 
the phosphoryl group to a sugar-specific permease. 

All these enzymes share the same catalytk: mechanism: they bind PEP and transfer the phosphoryl group from it to a 
histidine residue. The sequence around that reskJue is highly consented and can be used as a signature pattern for 
these enzymes. As a second signature pattern a consented region was selected in the C-terminal part of the PEP- 
utilizing enzymes. The biological significance of this region is not yet known. 

[1 128] Consensus pattern: G-IGA)-x-[TN]-x-H-lSTAl-[STAVHLIVMJ(2)-lSTAVHRGJ (H is phosphoiylated] 

- Consensus pattern: [DEQSK]-x-[LIVMFJ-S-ILIVMF]-G-[ST|-N-D-ILIVM]-x-Q-rLIVMFYGT]-fSTALIVl-rLIVMF]. 
[GASJ-x(2)-R ' 

( 1 J Reizer J., Hoischen C. Reizer A., Pham TN., Saier M.H. Jr. Protein Sci. 2:506-521(1993). 

[ 2] Reizer J., Reizer A.. Merrick M.J.. Plunkett G. III. Rose DJ.. Saier M.H. Jr. Gene 181:103-108(1996). 

[ 3] Pocalyko D.J.. Carroll L.J.. Martin B.M.. Babbitt P.C., Dunaway-Mariano D. Biochemistrv 29*10757-10765 

(1990). 

[ 4J Niersbach M., Kreuzaler F.. Geerse R.H.. Postma P.. Hirsch H.J. Mol. Gen. Genet. 232:332-336(1992). 
[1 129] 402. (PEPCK ATP) Phosphoenolpyruvate carboxykinase (ATP) signature 

Phosphoenolpyruvate carboxykinase (ATP) (EC 4. 1 . 1 .49) (PEPCK) (1 ] catalyzes the formation of phosphoenolpyruvate 
by decarboxylation of oxaloacetate while hydrolyzing ATP. a rate limiting step in gluconeogenesis (the biosynthesis of 
glucose). 

The sequence of this enzyme has been obtained from Escherichia coli. yeast, and Trypanosoma brucei- these three 
sequences are evolutionary related and share many regions of similarity As a signature pattern a highly consented 
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and maturation. This protein belongs to a family that also includes: 



- Drosophila antennal protein A5, a putative ocfc>rant-binding protein. 

- Onchocerca volvulus antigen Ov-1 6 and the related proteins D1 , 02 and D3 

- Plasmodium falciparum putative phosphatidylethanolamine-bindrig protein 

. Toxocara canis secreted antigen TES.26. This larval protein has been shown to bind phosphatidylethanolamine 
' pS?S;79T ^^'^'^'^''''^''''''^^'^^^^^^^ 

- Caenorhabditis elegans hypotheticaJ protein F40A3.3. 

st,'ueS o.TC«eS:. ''''^ is tocated ^ me end 0. me firs, mird of .he 

I1114J Consensus pattern: |FyLhx-[LyHUVF]-x-rnV]-{DCJ-P-D-x-P-{SN]-x{10)+l 

Lgl^SSc'i'gC""'' ^"""^ ^ - ^""^ J- Mc* EvoL 

{ 2J Schoentgen F., JoHes P. FE8S Lett. 369:22-6(1995). 
[1115] 396. PCI domain 

This domain has also been called the PINT mo.if (Proteasome 
ln.-6. Nip-1 and TRI P-1 5) [1 J. 
Number of members: 49 
[1] 

Medline: 98308842 

The PCI domain: a common ttteme in ttiree mulllpro.ein complexes 
Hofmann K. Bucher P; 

Trends Biochem Sci 1 998;23:204-205. 
l2]Medline: 98266368 

Homologues of 26S proteasome subunite are regula.ors of transcription and translation 
Aravind L, Ponting CP; 

Protein Sci 1998;7:1250-1254. 

. fJ: ^'^^ Pro«eirH.-isoaspartate (D-aspana.e) O-methyttransferase signature Protein-L-isoasoartate ID 
Sl^^iSr ''"'"""T' (PCMT)(1 J (Which is also known as LTsoaspart^Ceh^STX" 

yltransferase)is an enzyme that catalyzes the transfer of a me%l group from S^denosylm^ton?,e tom^m^Hvi 

groupso^D-aspartylorL-isoaspartyl residues inavariety Of pep.iLLpro.eins. 

IZ^ L-aspartyl and L-asparaginyl residues in p,o.eins. PCMT plays a role in the repair and/ordeg^^i^ oHh^ 

^i^T r W^conseived and wideV distributed cytosolic protein of about 24Kd As a sionature 

pattern, a consenred region in me central part of this enzyme has been developed ^ 
««« Consensus pattern; [GSA]-D-G-x(2)-G-{FYWV]-x{3HAS]-P-{FYHDNl-x-l - 

379 4(199^"^ "^"^"^ "^^"^ C - S. Comp. Biochem. Physiol 117b: 

[1119] 398. (PCNA) Proliferating cell nuclear antigen signatures 

Proliferating cell nuclear antigen (PCNA) [1 ,2) is a protein involved In DNA replica.ion by acting as a cofaCor for DNA 
polymerase detta. me polymerase responsible for leading s.rand DNA replication 

Jp«?f/r°'f " ^"'^'^ 7^^' '^^"^ ^"-^^ ^ associated wim polymerase III. the yeast analoq of polymerase 

wth^hr !' Z"^^^ ^""^ W »° »''9»'ly related to P(^A andVjroS^bl^SS 

wrth me viral encoded DNA polymerase. An homolog of PCNA is also found in archebac.eria a^«ed 

Sl-ST'"' '^'^HUVMFl-x-ILIVMAJ-x-(SAV]-(UVM^D-x-[NSAEJ-[HKRJ-(V.^x^^^^^^ 

- Consensus pattern: lRKAJ-C-{DEHRHJ-x(3HUVI^^x(3HLIVMJ-x^S6ANHLIVMF)-x-K-(UVMFK2) 
1 1| Bravo R.. Frank R.. BlundeR RA.. McDonaW-Bravo H. Nature 326 515-517(1987) 

1 2J Suzuka I.. Hate S.. Matsuoka M.. Kosugi S.. Hashimoto J. Eur. J. Bochem. 195:571-575(1991).[ 3] Bauer G. 
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[ 5) Ulrich S. J., Anderson C.W., Mercer W.E. Appella E. J. Biol. Chem. 267:15259-15262(1992). 
[1107] 391. (P5CR) Delta 1 -pyrroline-5-carboxylate reductase signature 

Detta l-pyrroline-5-carboxylate reductase (P5CR) (EC 1.5.1.2) [1.2] is the enzyme that catatyzes the terminal step in 
the biosynthesis of proline from glutamate, the NAD(P) dependent oxidation of 1-pyrrollne-5-carboxylate into proline. 
The sequences of P5CR from eubacteria (gene proC). archaebacteria and eukaryotes show only a moderate level of 
overall similarity. As a signature pattern, the best conserved region located in the C-terminal section of P5CR was 
selected. 

[1108] Consensus pattern: [PALFpx(2,3)-[UVl-x{3).(LIVI^.[STAC]-[STVl-x-IGAN]-G-x-T-x(^^ 
[LMFHDENQK] w i j i j ^\^} 

[ 1] Delauney A J., Vemria D.P. MoL Gen. Genet. 221:299-305(1990). 

[ 2J Savioz A.. Jeenes D.J.. Kocher H.P.» Haas D. Gene 86:107-111(1990). 

[1109] 392. Poly-adenylate binding protein, unique domain. 

[1110] 393. (PAL) Phenylalanine and hislidine ammonia-lyases active site 

Phenylalanine ammonia-lyase (EC 4.3.1.5) (PAL) is a key enzyme of plant and fungi phenylpropanoid metabolism 
which Is Involved in the biosynthesis of a wide variety of secondary metabolites such as flavanoids. furanocoumarin 
phytoalexins and cell wall components. These compounds have many important roles in plants during normal growth 
and in responses to environmental stress. PAL catalyzes the removal of an ammonia group from phenylalanine to form 
trans-cinnamate. 

Histidlne ammonia-lyase (EC 4.3.1.3) (histidase) catalyzes the first step in histidine degradation, the removal of an 
ammonia group from histidlne to produce urocanic acid. 

The two types of enzymes are functionally and stmcturally related [1]. They are the only enzymes which are known to 
have the modified amino acid dehydroalanlne (DHA) In their active site. A serine residue has been shown [2,3 4] to be 
the precursor of this essential electrophilic moiety. The regfon around this active site residue is well conserved and 
can be used as a signature pattern. 

[1111] Consensus pattern: G4STGHLIVMl-[STGHACl-S-G-[DHl-L-x-P-L-[SA]-x(2)-[SA) [S is the active site residue] 

[ 1] Taylor R.G.. Lambert M.A., Sexsmith E., Sadler S.J., Ray P.N., Mahuran D.J., Mclnnes aR J Biol Chem 
265:18192-18199(1990). 

[ 2) Langer M.. Reck G.. Reed J., Retey J. Biochemistry 33:6462-6467(1994). 

1 3J Schuster B.. Retey J. FEBS Lett 349:252-254(1994). 

( 4] Taylor R.G.. Mclnnes R.R. J. Bbl. Chem. 269:27473-27477(1994). 

[1112] 394. PAS domain 

-!- CAUTION. This family does not currently match all known examples of PAS domains. 
PAS motifs appear in archaea, eubacteria and eukarya. Probably 
the iTK)st surprising kJentification of a PAS domain was that in 
E AG-like K+-channels(1,3J. 
Number of members: 308 
[1] 

Medline: 97446881 

PAS domain S-boxes in archaea, bacteria and sensors for oxygen and redox. 
Zhulin IB. Taylor BL. Dixon R; 
Trends Biochem Sci 1997;22:331-333. 
[2]Medline: 95275818 

1.4 A structure of photoactive yellow protein, a cytosolic photoreceptor unusual foW. active site, and chromophore 
Borgstahl GE, Williams DR, Getzoff ED; 
Biochemistry 1995;34:6278-6287. 
[3]Medline: 98044337 
PAS. a muftifunctkjnal domain family comes to light. 
Ponting CP, Aravind L; 
Curr Biol 1997;7:674-677. 
[1113] 395. (PBP) Phosphalidylethanolamine-blnding protein family signature 

Mammalian phosphatidylethanolamlne-binding protein (also knowns as basic cytosolic 21 Kd protein) Is a 186 residue 
protein found in a variety of tissues [1 J. It binds hydrophobic llgands, such as phosphatidylethanolamine but also seems 
[2J to bind nucleotides such as GTP and FMN. it Is suggested that it could act In membrane remodeling during growth 
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This family is part of complex I which catalyses the transfer of two electrons from NADH to ubiquinone in a reaction 
that is associated with proton translocation across the nnernbrane. Number of members* 1824 
[IJ 

Medline: 93110040 

The NADHrubiquinone oxidoreductase (complex I) of respiratory chains. V\falker JE; 

Q Rev Biophys 1 992;25:253-324. 
[1 1 00] 387. (oxidored q3) N ADH-ubiquinone/plastoquffK>ne oxidoreductase chain 6. 1 79 members. 
[11011 388. (c»cidored q5) NADH-ubiquinone oxidoreductase chain 4. amino terminus 
[1102] [1] Walker JE ; Q Rev Bbphys 1992;25:253-324. 

[1103] 389. (oxidored q6) Respiratory^in NADH dehydrogenase 20 Kd subunit signature Respiratory^jhain NADH 
dehydrogenase (EC 1,6.5.3) [1,2] (also known as complex I or NADH-ubiquinone oxktoreductase) is an oligomeric 
enzymatic complex located in the inner mitochondrial membrane whk:h also seems to exist in the chtoroplast and in 
cyanobacteria (as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 polypeptkie subunits of this bwener- 
getic enzyme complex there is one with a molecular weight of 20 Kd (in mammals) [3], which is a component of the 
iron-sulfur (IP) fragment of the enzyme. It seems to bind a 4Fe^ iron-sulfur cluster. The 20 Kd subunit has been 
found to be: 

- Nuclear encoded, as a precursor form with a transit peptide In mammals, and 'n Neurospora crassa - Mitochondrial 
erKoded in Paramecium (gene psbG). 

- Chtoroplast encoded in various higher plants (gene ndhK or psbG). 



The 20 Kd subunit is highly similar to [4J: 

Synechocystis strain PCC 6803 proteins psbGI and psbQ2. 
- Subunit B of Escherichia coli NADH-ubiquinone oxidoreductase (gene nuoB). 
■ Subunit NQ06 of Paracoccus denitrificans NADH-ubiquinone oxkioreductase. 

Subunit 7 of Escherichia coli formate hydrogenlyase (gene hycG). 

Subunit I of Escherichia coli hydrogenase-4 (gene hyfl). 

As as signature pattem a highly consen/ed region was selected, located in the central sectton of this subunit and which 
contains a consented cysteine that is probably involved in the binding of the 4Fe-4S center 

[1104] Consensus pattem: lGN]-x-D.[EASTh[LI\^F](2)-P-[IVl-D-ILIVMFYWJ(2)-x-P-x.C-P-[PT] [The C is a putative 
4Fe-4S ligand] 

[ 1] Ragan C.L Gurr. Top. Bioenerg. 15:1-36(1987). 

[ 2) Weiss H.. Friedrich T, Hofhaus G.. Preis D. Eur. J. Btochem. 197:563-576(1991). 

( 3) Arizmendi J.M., Runswrck M.J„ SkehelJ.M., V\falker J.E. FEBS Lett. 301:237-242(1992) 

1 4) Weidner U.. Geler S.. Ptock A., Friedrch T, Leif H.. Weiss H. J. Mol. Bbl. 233:109-122(1993). 

[1105] 390. p53 tumor antigen signature 

The p53 tumor antigen [1 to 5. E1,E2] is a protein found in increased amounts in a wide variety of transfonned cells 
It IS also detectable in many proliferating nontransfonned cells, but it is undetectable or present at low levels in resting 
cells. It IS frequently mutated or inactivated in many types of cancer. p53 seems to act as a tumor suppressor in some 
but probably not all, tumor types. p53 is probably involved in cell cycle regulation, and may be a trans-activator that 
acts to negatively regulate cellular division by controlling a set of genes required for this process 
p53 is a phosphoprotein of about 390 amino acids which can be subdivided into four domains: a highly charged acidic 
region of about 75 to 80 residues, a hydrophobic proiine-rich domain (position 80 to 150). a central regton (from 150 
to about 300), and a highly basic C-terminal regkxi. The sequence of p53 is well conserved in vertebrate species- 
attempts to identify p53 in other eukaryolk: philum has so far been unsuccessful 

As a signature pattem for p53 a perfectly consented stretch of 13 residues kxated in the central regbn of the protein 
was selected. This region, known as domain IV in [31. is involved (along with an adjacent region) In the binding of the 
large T antigen of SV40. In man this region is the focus of a variety of point mutations in cancerous tumors 
[1106] Consensus pattem: M-C-N-S-S-C-M-G-G-fy/i-N-R-R 



[ 1 J Levine AJ., Momand J., Finlay C.A. Nature 351:453-456(1991). 
1 2J Levine A.J., Momand J. Biochim. Biophys. Acta 1032:119-136(1990). 
[ 3J Soussi T, Caron De Fromentel C, May R Oncogene 5:945-952(1990). 
[ 4J Lane D.P. Benchimol S. Genes Dev. 4:1-8(1990). 
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1 31 Denhardt D.T., GuoX FASEB J. 7:1475-1482(1993). 
[1090] 382. Oxysterol-binding protein family signature 

A number of eukaryotic proteins that seem to be involved with sterol synthesis and/or its regulation have been found 
[1] to be evolutionary related: 

- Mammalian oxysterol-binding protein (OSBP). A protein of about 800 amino-acid residues that binds a variety of 
oxysterols: oxygenated derivatives of cholesterol. OSBP seems to play a complex role in the regulation of sterol 
metabolism. 

- Yeast proteins HES1 and KES1; highly related proteins of 434 residues that seem to play a role in ergosterol 
synthesis. 

- Yeast OSH1 , a protein of 859 residues that also plays a role in ergosterol synthesis, - Yeast hypothetical protein 
YHROOIw (437 residues). 

Yeast hypothetical protein YHR073w (996 residues). 
IS - Yeast hypothetical protein YKR003w (448 residues). 

[1091] All these proteins contain a nrK)derately conserved domain of about 250 residues located in the C-terminal 
half of OBSR OSH1 and YHR073w and in the central section of the other proteins. As a signature pattern, the best 
consen/ed part was selected of this domain, a region that contains a consented pentapepttde. 
20 [1092] Consensus pattern: E-[KQl-x-S-H-[HR}-P-P-x-lSTACFl-A 

[1093] 1 1] Jiang B., Brown J.L, Sheraton J.. Fortin N., Bussey H. Yeast 10:341-353(1994). 

[1 094] 383. FMN oxidoreductase 

[1 095] 384. Oxidoreductase FAD/NAD-binding domain 

Number of members: 250 

25 [1] 

Medline: 92084635 

The sequence of squash NADH:nttrate reductase and its relationship to the sequences of other flavoprotein oxidore- 
ductases. A family of flavoprotein pyridine nucleotide cytochrome reductases. 
Hyde GE. Crawford NM, Campbell W; 
30 J Biol Chem 1 991 ; 266: 23542-23547. 

[2]Medline: 95111952 

Crystal structure of the FAD-containing fragment of com nitrate reductase at 2.5 A resolution: relationship to other 
flavoprotein reductases. 
Lu G, Campbell WH. Schneider G, LIndqvist Y; 
35 Structure 1 994:2:809-821 . 

[1096] 385. (oxidored motyb) Eukaryotic molybdopterin oxidoreductases signature A number of different eukaryotic 
oxidoreductases that require and bind a molybdopterin cofactor have been shown ( 1 J to share a few regions of sequence 
similarity. These enzymes are: 

^0 - Xanthine dehydrogenase (EC 1 . 1 . 1 .204). which catalyzes the oxidation of xanthine to uric acid with the concomitant 
reductfon of NAD. Structurally, this enzyme of about 1300 amino acids consists of at least three distinct domains: 
an N-terminal 2Fe-2S ferredoxin-like iron-sulfur binding domain (see <PDOC00175>). a central FAD/NAD-binding 
domain and a C-terminal Mo-pterin domain. 

- Aldehyde oxidase (EC 1.2.3.1). which catalyzes the oxidatton aldehydes mto acids. Aldehyde oxidase is highly 
^ similar to xanthine dehydrogenase in its sequence and domain structure. 

- Nitrate reductase (EC 1 .6.6. 1 ), which catalyzes the reduction of nitrate to nitrite. Structurally, this enzyme of about 
900 amino acids consists of an N-terminal Mo-pterin domain, a central cytochrome b5-type heme-binding domain 
(see <PDOC00170>) and a C-terminal FAD/NAD-binding cytochrome reductase domain. 

- Sulfite oxidase (EC 1.8.31). which catalyzes the oxidation of sulfite to sulfate. Structurally, this enzyme of about 
460 amino acids consists of an N-terminal cytochrome b5-binding domain followed by a Mo-pterin domain. 

There are a few conserved regions in the sequence of the molybdopterin-binding domain of these enzymes. The paUern 
used to detect these proteins is based on one of them. It contains a cysteine residue which couW be involved in binding 
the molybdopterin cofactor. 

[1097] Consensus pattern: [GA]-x(3)-[KRNQHTl-x(11.14)-(LIVMFYWSl-x(8)-[LIVMF]-x-C-x(2)-[DEN]-R-x(2)-[DEl 
[1098] [1 1 Wootton J.C.. Nicolson R.E.. Cock J.M.. Walters D.E.. Burke J.F.. Doyle W. A.. Bray R.C. Biochim Biophys 
Acta 1057:157-185(1991). 

[1099] 386. (Oxidored ql) NADH-Ubk^uinone/plastoquinone (complex I), various chains 
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droplets (0.2 to 1 .5 mu-m in diameter) containing mostly triacylglycerol that are surrounded by a phospholipkifoleosin 
annulus. Oleosins may have a structural role in stabilizing the lipid body during dessicatlon cA the seed by preventlnq 
coalescence of the oil. They may also provide recognition signals for specific lipase anchorage in lipolysis during 
seedling growth. Oleosins are found in the monolayer lipid/ water Interface of oil bodies and probably interact with both 
the lipid and phospholipid moiettes. 

Oleosins are proteins of 16 Kd to 24 Kd and are composed of three domains: an N-termtnal hydrophlllc region erf 
variable length (from 30 to 60 residues); a central hydrophobic domain of about 70 residues and a C-terminal amphip- 
athic region of variable length (from 60 to 100 residues). The central hydrophobic domain is proposed to be made up 
of beta-strand structure and to interact with the lipids [2). It is the onV domain whose sequence is conserved and 
therefore a section from that domab) was selected as a signature pattern. 
[10821 Consensus pattern: [AGHSTl-x(2HAG^x(2HUV^fl^SAD^T-P-IUVMF](4)-F.S-P^UVI^^ 

[1] Murphy D.J.. Keen O^SulIivan J.N.. Au D.M.Y.. Edwards E-W, Jackson P.J.. Cummins I.. Gibbons T 

Shaw CM., Ryan A.J. Biochim. Bipphys. Acta 1088:86-94(1 ggi), 

[ 2] Tzen J.T.C.. Lie G.C.. Huang A.H.C. J. BioL Chem. 267:15626-15634(1992). 

[10831 379. (Orbi VPS) Orbivirus outer capsid proteh VP5 

[1 0841 This paper shows the location of the different capsid proteins and their relation to each other. 
[1085] [1] Schoehn G, l^oss SR, Nuttail PA. Hewat EA; Virology 1997;235: 191-200 
[1086] 380. Om/DAP/Arg decarboxylases family 2 signatures 

PyridDxal-dependent decarboxylases acting on ornithine, lysine, arginine and related substrates can be classified into 
two different families on the basis of sequence similarities 11,2.3]. The second family consists of: 

. Eukaryotic ornithine decarboxylase (EC 4.1.1.17) (ODC). ODC catalyzes the transformation of omithine into pu- 

trescine. 

- Prokaryotic diaminopimeBc acid decarboxylase (EC 4.1 .1.20) (DAPDC). DAPDC catalyzes the conversion of di- 
aminopimelic acid into lysine; the last step in the biosynthesis of lysine 

" S V ^IJteJ^oTAfSc ^^"^ ^^'^ "'^^'^ '^'^^ '"^ biosynthesis of tabtoxin and is 

- Bacterial and plant biosynthetic arginine decarboxylase (EC 4.1.1.19) (ADC). ADC catalyzes the transformation 
Of arginine into agmatine, the first step in the biosynthesis of putrescine from arginhe. 

The abo/e proteins, virile most probably evolutionary related, do not share extensive regions of sequence similarities 
Two of the conserved regions were selected as signature patterns. The first pattern contains a conserved lysine residue 
whKh IS known, in mouse ODC (4J, to be the site of attachment of the pyrktoxaK>hosphate group. The second pattern 
contains a stretch of three consecuUve glycine residues and has been proposed to be part of a substrate-binding regton 

These enzymes are collectively known as group IV decarboxylases (3) 

t^p^Hd^x^pT^^^^^^^ IK is 

Consensus pattern: lGS].x(2.6HLIVMSCP]-x(2)-[UVMF]. [DNSJ-IUVMCAJ-G-G-G^UVMFYHGSTPCEQ] 

[ 1] Bairoch A. Unpublished obsen^ations (1993). 

[ 2) Martin C. Cami B.. Yeh P. Stragier P. Parsot C. Pane J.-C. Mol. BioL EvoL 5:549-559(1988) 

[ 3J Sandmeier E.. Hale TL. Christen P Eur. J. Biochem. 221:997-1002(1994). 

[4J Poulin R.. Lu L. Ackermann B.. Bey P. Pegg A.E. J. Bbl. Chem. 267:150-158(1992) 

1 5] l^oore R.C., Boyle S.M. J. Bacteriol. 172:4631-4640(1990). 

[1088] 381. Osteopontin signature 

Osteopontin Is an acidic phosphorylated glycoprotein of about 40 Kd which is abundant in the mineral matrix of bones 
and which binds tightly to hydroxyapatite [1.2.3]. It is suggested that osteopontin might function as a cell attachment 
tector and could play a key role in the adhesion of osteoclasts to the mineral matrix of bone 
Osteopontin-K is a kidney protein which Is highly similar to osteopontin and probably also involved in cell^dhesion 
M r^o^^'"'^ ^ conserved region located at the N^emr^inal extremity of the mature protein was selected 

[1089] Consensus pattern: |KQ]-x-(TAJ-x(2)-[GA]-S-S-E-E-K 



[ 1) Butler W.T Connect. Tissue Res. 23:123-36(1989). 
[ 2] Gorski J.P. Cateif. Tissue Int. 50:391-396(1992). 
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[1076] 375. Orotidine 5'-phosphate decarboxylase active site 

Orotldine 5'-phosphate decarboxylase (EC 4.1.1.23 ) (OMPdecase) [1 ,2) catalyzes the last step in the de riovo biosyn- 
thesis of pyrimidines. the decarboxylation of OMP into UMR In higher eukaryotes OMPdecase is part, with ©rotate- 
phosphoribosyltransferase, of a bif unctional enzyme, white the prokaryotic and fungal OMPdecases are rnonof unctional 
protein. Some parts of the sequence of OMPdecase are well conserved across species. The best consented region is 
located in the N-temiinal hart of OMPdecases and is centered around a lysine residue which is essential for the catalytic 
function of the enzyme. This region has been developed as a signature pattern. 

[10771 Consensus pattern: [UVMFTAHU\^FT-x-D-x-K-x(2VD-l-lGP]-x-T.[LIVMrrAl [K is the active site residue}- 

( 1J Jacquet M.. Guilbaud R.. Garreau H. Mol. Gen. Genet. 211:441-445(1988). 
1 2] Kimsey H.H., Kaiser D. J. Biol. Chem. 267:819-824(1992). 

[1 078] 376. ATP synthase delta (OSCP) subunit signature 

ATP synthase (proton-translocattfig ATPase) (EC 3.6.1.34) [1,2] is a component of the cytoplasmic men*rane of eu- 
bacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATPase complex is 
composed of an oligomeric transmembrane sector, caBed CF(0), which acts as a proton channel, and a catalytic core, 
termed coupling factor CF(1 ). 

One of the subunits of the ATPase complex, known as subunit delta in bacteria and chloroplasts or the Oligomycin 
Sensitivity Conferral Protein (OSCP) in mitochondria, seems to be part of the stalk that links CF(0) to CF(1). It either 
transmits conformatk>nal changes from CF(0) into CF(1 ) or is involved in proton conduction [3]. 
The different deltaADSCP subunits are proteins of approximately 200 amino-acid residues - once the transit peptide 
has been removed in the chloroplast and mitochondrial forms • which show only moderate sequence homotogy 
The signature pattern used to detect ATPase delta/OSCP subunits is based on a consen/ed regbn in the C-terminal 
section of these proteins. 

[1 079] Consensus pattern: [LI VMJ-x-[LI VMFYT]-x(3)-[U VMT]4DENQK]-x(2)-lUVMl-x-IGSAl-G-rLI VMFYGA^x- 
[LIVM]-[KRHENQl-x-[GSEtMJ J i J i r 

( 1] Futai M., NoumI T, Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
[ 2] Senior A.E. Physiol. Rev. 68:177-231(1988). 

1 3J Engelbrecht S., Junge W. Biochim. Bk)phys. Acta 1015:379-390(1990). 
[1 080] 377. Aspartate and ornithine carbamoyltransf erases signature 

Aspartate carbamoyltransferase (EC 2.1.3.2) (ATCase) catalyzes the conversion of aspartate and carbamoyl phos- 
phate to carbamoy Aspartate, the second step in the de novo biosynthesis of pyrimidine nucleotides [1]. In prokaryotes 
ATCase consists of two subunits: a catalytic chain (gene pyrB) and a regulatory chain (gene pyri), while in eukaryotes 
It IS a domain in a multi-functbnal enzyme (called URA2 in yeast, rudimentary in Drosophila. and CAD in mammals 
[2]) that also catalyzes other steps of the biosynthesis of pyrimidines. 

Ornithine carbamoyltransferase (EC 2. 1 .3.3) (OTCase) catalyzes the conversion of omithine and carbamoyl phosphate 
to citrulline. In mammals this enzyme participates in the urea cycle [3] and is located in the mitochondrial matrix. In 
prokaryotes and eukaryotic microorganisms it is involved in the biosynthesis of arginine. In some bacterial species it 
is also involved in the degradation of arginine [4] (the arginine deaminase pathway). 

It has been shown [5] that these two enzymes are evolutionary related. The predicted secondary stnjcture of both 
enzymes are similar and there are some regions of sequence similarities. One of these regions Includes three residues 
which have been shown, by crystallography studies [6], to be Implfcated in binding the phosphoryl group of carbamoyl 
phosphate. ^ ^ r i 

This region was selected as a signature for these enzymes. 

Consensus pattern: F-x-[EK]-x-S-IGT]-R-T[S. R. and the 2nd T bind carbamoyl phosphate) 

-Note: the residue in position 3 of the pattern allows to distinguish between an ATCase (Glu) and an OTCase (Lys). 

[ 1] Lerner C.G., Switzer R.L J. Biol. Chem. 261:11156-11165(1986). 

[ 2] Davidson J.N., Chen K.C., Jamison R.S., Musmanno LA. Kern C.B. BroEssays 15:157-164(1993), 

[ 3] Takiguchi M.. Matsubasa T, Amaya Y.. Mori M. BioEssays 10:163-166(1989). 

[ 4] Baur H., Slalon V. Falmagne R. Luethi E., Haas D. Eur. J. Biochem. 166:111-117(1987). 

( 5] Houghton J.E.. Bencini D.A., O'Donovan G.A., Wild J.R. Proc. Natl. Acad. Sci. U.S.A. 81:4864-4868(1981). 

[ 6) Ke H.-M., Honzatko R.B.. Lipscomb W.N. Proc. Natl. Acad. Sci. U.S.A. 81:4037-4040(1984). 

[1 081) 378. Oleosins signature 

Oleosins [1] are the proteinaceous components of plants' lipid storage bodies called oil bodies. Oil bodies are small 
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[1071] 373. NusB family 

[1072] The NusB protein Is Involved in the regulation ot rRNA biosynthesis by transcriptional antiterminatlon. 

[1 073] Huenges M. Rob C, Gschwind R, Peterander* R, Berglechner F, Richter G, Bacher A. Kessler H.Gemmecker 

G. EMBO J 1998;17:4092-4100. 

[1074] 374. (Neur Chan) Neurotransmrtter-gated Ion-channels signature 

Neurotransmitter-gated Ion-channels [1.2,3,4] provide the molecular basis for rapid signal transmission at chemical 

synapses. They are post-synapticoligomeric transmembrane complexes that transiently form a ionic channel upon the 

binding of a specific neurotransmitter Presently, the sequence of subunits from five types of neurotransmitter-gated 

receptors are known: - The nicotinic acetylcholine receptor (AchR). an excitatory catton channel. In the motor endplates 

of vertebrates, it is composed of four different subunits (alpha, beta, gamma and delta or epslk>n) with a molar stofchh 

ometry of 2:1 :1 : 1 . In neurones, the AchR receptor Is composed of two different types of subunits: alpha and non-alpha 

(also called beta). Nicotinic AchRs are also found In invertebrates. - The glycine receptor, an inhibitory chloride ion 

channel. The glycine receptor is a pentamer composed of two different subunits (alpha and beta). - The gamma- 

aminobutyric-acid (GABA) receptor, which is also an inhibitory chlorkie ion channel. The quaternary stnicture of the 

GABA receptor is complex; at least four classes of subunits are known to exist (alpha, beta, gamma, and delta) and 

there are many variants In each class (for example: six variants of the alpha class have already been sequenced). - 

The serotonin 5HT3 receptor. Serotonin Is a biogenic hormone that functions as a neurotransmitter, a hormone and a 

mitogen. There are seven major groups of serotonin receptors; six of these groups (5HT1, 5HT2, and 5HT4 to 5HT7) 

transduce extracellular signal by activating G proteins, while 5HT3 is a ligand-gated cation-specific ion channel which, 

when activated causes fast, depolarizing responses in neurons. - The glutamate receptor, an excitatory catton channel! 

Glutamate is the main excitatory neurotransmitter in the brain. At least three different types of glutamate receptors 

have been described and are named according to their selective agonists (kainate, N-methyl-D-aspartate (NMDA) and 

quisqualate). All known sequences of subunits from neurotransmitter-gated ion-channels are structurally related. They 

are composed of a large extracellular glycosylated N-termlnal ligand-blnding domain, foltowed by three hydrophobic 

transirombrane regions which form the ionic channel, followed by an intracellular region of variable length. A fourth 

hydrophobic region is found at the C-terminal of the sequence. The sequence of subunits from the AchR, GABA, 5HT3, 

and Gly receptors are clearly evolutk)nary related and share many regions of sequence similarities. These sequence 

similarities are either absent or very weak In the Glu receptors. In the N4emiinal extracellular domain of AchR/GABA/ 

5HT3/Gly receptors, there are two consented cysteine residues, which, in AchR. have been shown to form a disulfide 

bond essential to the tertiary structure of the receptor. A number of amino acids between the two disulfide-bonded 

cysteines are also consen/ed. Therefore this region was used as a signature pattern for this subclass of proteins 

[1075] Consensus pattern: C-x-[LIVMFQ]-x-[LIVMF].x(2)-IFY]-P-x-D-x(3)-C [The two C's are linked by a disulfide 
bond]- 

[ 1] Stroud R.f^.. McCarthy M.R. Shuster M. Bbchemistry 29:11009-11023(1990). 
[ 2] Betz H. Neuron 5:383-392(1990). 

[ 3] Dingledlne R., Myers S.J.. Nicholas R.A. FASEB J, 4:2632-2645(1990). 
[ 4] Barnard E.A. Trends Biochem. Sci. 17:388-374(1992). 
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[1 066] 371 . NifU-like domain 

[1 067] This Is an alignment of the carboxy-terminal domain. This is the only common region between the Nif U protein 
from nitrogen-fbcing bacteria and rtuxiobacterial species. The biochemk:al function of NifU is unknown [1]. 
[1068] Ouzounis C, Borit P. Sander C, Trends Biochem Sci 1994;19:199-200. 
[1069] 372. Nitrilases / cyankJe hydratase signatures 

Nitrilases (EC 3.5.5.1 ) are enzymes that convert nitriles into their corresponding acids and anvnonia. They are wkJe- 
spread in microbes as well as in plants where they convert indole-3-acetonitrile to the hormone indo!e-3-acetic acid. 
A consented cysteine has been shown [1 ,2J to be essential for enzyme activity; it seems to be involved in a nucleophilic 
attack on the nilrPe carbon atom. Cyanide hydratase (EC 4.2.1.66) converts HCN to formamide. In phytopathogenic 
fungi, it Is used to avokJ the toxic effect of cyanide released by wounded plants [3]. The sequence of cyanide hydrolase 
is evolutionary related to that of nitrilases. Yeast hypothetical proteins YIL1 64c and YIL1 65c also bekxig to this family 
As signature patterns for these enzymes, two conserved regions were selected. The first is kx:ated in the N-terminal 
section while the second, which contains the active site cysteine, is located in the central section. 
[1070] Consensus pattern: G-x(2)-[LIVMFY](2)-x-[lF]-x-E-x(2)-[LIVMl-x-G-Y-P- 

Consensus pattern: G-lGAQ]-x(2)-C-[WA]-E-[NH]-x(2)-[PST>[UVMFYSl-x-[KR] [C Is the active site reskJue]- 

[ 1] Kobayashi M., Izui H., Nagasawa T. Yamada H. Proc. Natl. Acad. Sci. U.S.A. 90:247-251(1993). ^ 
1 2] Kobayashi M.. Komeda K. Yanaka N., Nagasawa T., Yamada H. J. Bid. Chem. 267:20746-20751(1992). 
[ 31 Wang R, Vanetten H.D. Biochem. Biophys. Res. Commun. 187:1048-1054(1992). 
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[ 1] Campbell W.H., Kinghorn J R. Trends Biochem. Scl. 15:315-319(1990). 
( 2) Crane B.R.. Siegel LM., Getzoff E.D. Science 270:59-67n 995V 

[ 3) Gisselmann G., Klausmeler R, Schwenn J.D. Biochim. Bbphys. Acta 1144:102-106(1993). 
[41HuangC.J., Barrett E.L J. BacterioL 173:1544-1553(1991). 

[10551 367. (NMT) Myristoyl-CoArprotein N-myristoyltransf erase signatures. Myristoyl-CoA: 

protein N-myristoyltransferase (EC 2.3.1.97) (Nmt) [1] is the enzyme responsible for transferring a myristate group on 

the N-terminal glycine of a number of cellular eukaryotic and viral proteins, Nmt is a nrwnomeric protein of about 50 to 

60 Kd whose sequence appears to be well conserved. Two highly conserved regions have been developed as signature 

patterns. The first one is located In the central section, the second in the C4ermlnal part. 

[1056] Consensus pattern: E-I-N-F-L-C-x-H-K- 

Consensus pattern: K-F-G-x-G-D*G- 

[1057] [ 1 J Rudnick D.A.. McWherter C.A.. Gokel G.W.. Gordon J.I. Adv. Enzymol. 67:375-430(1993). 
[1 058] 368. ADP-glucose pyrophosphorylase signatures (NTPJransf erase) 

[1059] ADP-glucose pyrophosphorylase (glucose-1 -phosphate adenylyltransferase) [1 ,2](EC 2.7.7.27) catalyzes a 
veiy important step in the biosynthesis of alpha 1.4-glucans (glycogen or starch) in bacteria and plants: synthesis of 
the activated glucosyl donor, ADP-glucose. from glucose-1 -phosphate and ATR ADP-glucose pyrophosphorylase is a 
tetrameric allosterically regulated enzyme. It is a homotetramer in bacteria while in plant chloroplasts and amyloplasts, 
it is a heterotetramer of two different, yet evolutk^iary related, subunits. There are a number of conserved regk)ns iri 
the sequence of bacterial and plant ADP-glucose pyrophosphorylase subunits. Three of these regions were selected 
as signature patterns. The first two are N-terminal and have been proposed to be part of the allosteric and/or substrate- 
binding sites in the Escherichia coli enzyme (gene gIgC). The third patlem corresponds to a conserved region in the 
central part of the enzymes. 

[1 060] Consensus pattern: [AG]-G-G-x-G-[STK]-x-L"X(2)-L-[TA}-x(3)-A-x-P-A-ILVl- 

Consensus pattern: W-(FY]-x-G-[ST]-A-[DNSH]-[AS]-[LIVMFYW]- 

Consensus pattern: (APV]4GS)-M-G-[LIVMN]-Y-[IVCHUVMFY]-x(2)-[DENPHK]- 

( IJNakataRA., Greene TW., Anderson J.M..Smith-VWilteB.J.,OkitaT.W.,Preiss J. Pla^^ Bbl 17 1089-1093 
(1991). 

[ 2] Preiss J., Ball K., Hutney J., Smith-White B.J., Li. L, Okitsa TW. Pure Appl. Chem. 63:535-544(1991). 
[1061] 369. Sodium/hydrogen exchanger family 

[1062] Na/H antiporters are key transporters in nnaintaining the pH of actively metabolizing cells. The molecular 
mechanisms of antiport are unclear. 

These antiporters contain 10-12 transmembrane regions (M) at the amino-terminus and a large cytoplasmic region at 
the carboxyl terminus. The transmembrane regions M3-M12 share identity with other members of the family. The M6 
and M7 regions are highly conserved. Thus, this is thought to be the region that is involved in the transport of sodium 
and hydrogen ions. The cytoplasmic region has little similarity throughoU the family 

[1063] [1] Dibrov P. Fliegel I,' FEBS Lett 1998;424:1-5. (21 Orlowski J. Grinstein S; J Biol Chem 1997 272 
22373-22376.(3] Numata M. Petrecca K, Lake N. Ortowski J; J Biol Chem 1 998;273:6951 -6959. 
[1064] 370. Sodium:sulfate symporter family signature (Na_sulph_symp) 

Integral membrane proteins that mediate the intake of a wide variety of nriolecules wrth the concomitant uptake of 
sodium k>ns (sodium symporters) canbe grouped, on the basis of sequence and functional similarities into a number 
of distinct families. One of these families currently consists of the following proteins: - {Mammalian sodium/feulfate 
cotransporter (1). - l^ammalian renal sodium/dfcarboxylate cotransporter (2J. whch transports succinate and citrate - 
Mammalian intestinal sodium/dk:arboxylate cotransporter. - Chlamydomonas reinhardtil putative sulfur deprivation re- 
sponse regulator SAC1 [3]. - Caenorhabdrtis elegans hypothetical proteins B0285.6. F31F6.6, K08E5.2 and R107.1. 
- Escherichia coli hypothetical protein yfbS. - Haemophilus influenzae hypothetical protein HI0608. - Synechocystis 
strain PCC 6803 hypothetical protein sll0640. - Methanococcus jannaschii hypothetical protein MJ0672 These trans- 
porters are proteins of from 430 to 620 amino acids which are highly hydrophobic and whfch probably contain about 
12 transmembrane regk)ns. As a signature pattern, a conserved region was selected which is located in or near the 
penultimate transmembrane regbn. 

[1065] Consensus partem: lSTACP]-S-x(2)-F-x(2)-P-[LlVMHGSAl-x(3)-N-x-{LIVM]-V- 

1 1) Markovich D., Forgo J.. Stange G., Biber J.. Murer H. Proc. Natl. Acad. Sci. U.S.A. 90:8073-8077(1993) 

( 2] P£jor A.M. Am. J. Physiol. 270:642-648(1 996). 

[ 3J Davies J.R, YiWiz F.K, Grossman A. EMBO J. 15:2150-2159(1996). 
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[ 8J Ju a. Morrow B.E.. V\femer J.R. Mol. Cea. Biol. ia5226-5234(1990). 
1 9J Klempnauer K.-H.. Sippei A.E. EMBO J. 6:2719-2725(1987). 

[1044] 36^ NAD-dependent glyceroI-3-phosphate dehydrogenase signature 

NACKJependent gtycero|.3-phosphate dehydrogenase (EC 1.1.1.8> (GPD) catalyzes the reversible reduction of dihy- 
droxyacetone phosphate to glycerol-3- phosphate. It is a eukaryotc cylosolic homodlmeric protein of about 40 Kd. As 
a signature pattern a glycine-rich region that is probably [1 ] involved in NAD-binding was selected. 

^^^"S pattern G^AT>IUVM]-K-[DNHLIVM](2)-A-x4GAl-x>G-[UVMF>x- PE]-G4UVMJ-x4UVMFYWJ- 

[1046] [ 1] Otto J., Argos R. Rossmann M.G. Eur J. Biochem. 109:325-330(1980). 
[1047] 363. Nucleosome assembly protein (NAP) 

[1048J It is thought that NAPs may be involved in regulating gene expression as a result of histone accessibility [l]. 

[1] Rodriguez R Munroe D. Prawitt D. Chu LL, Brie E. Kim J. Reld LH, Davies C, Nakagama H. Loebbert R 
Winterpacht A, Petruzzi MJ. Higgins MJ. Nowak N, Evans G. Shows T. Weissman BE, Zabel B, Housman De' 
Pelletier J, Genomics 1 997;44:253-265. 

[2] Schnieders F, DorkX Amemann J. Vogel T. Wemer M, Schmidlke J; Hum Mol Genet 1996;5:1801-1807. 

[10491 364. NB-ARC domaffi 

van der Biezen EA. Jones JD. Curr Biol 1 99B;8:226-227. 
[1050] 365. Nucleoside diphosphate kinases active site 

[1 051] Nucleoside diphosphate kinases (EC 2.7.4.6 ) (NDK) [1] are enzymes required for the synthesis of nucleoskie 
triphosphates (NTP) other than ATP They provkJe NTPs for nucleic acid synthesis, CTP for lipid synthesis UTP for 
polysaccharide synthesis and GTP for protein etongation, signal transductfon and mrcrotubule polymerizatiw In eu- 
karyotes, there seems to be a small family of NDK isozymes each of which acts in a different subcellular compartment 
and/or has a distinct biological function. Eukaryotic NDK isozymes are hexamers of two highly related chains (Aand 
B) [2]. By random association (A6. A5B...AB5. 86). these two kinds of chain form isoenzymes differing in their isoelectric 
point. NDK are proteins 0117 Kd that act via a ping-pong mechanism in which a histkjine residue is phosphorylated 
by transfer of the temiinal phosphate group from ATP In the presence of magnesium, the phosphoenzyme can transfer 
Its phosphate group to any NOP. to produce an NTPNDK isozymes have been sequenced from prokaryotic and eu- 
karyotic sources. It has also been shown [3] that the Drosophila awd (abnormal wing discs) protein, is a microtubule- 
associated NDK. Mammalian NDK is also known as metastasis inhibition factor nm23.The sequence of NDK has been 
highly conserved through evolutbn. There is a single histidine residue consented in all known NDK isozymes, which 
IS involved in the catalytic mechanism [2]. Our signature pattern contains this resWue. 
[1052] Consensus pattern: N-x(2)-H4GA]-S-D-[SAl-IUVMPKNE] [H is the putative active site residuej- 

[ 1J Parks a, Aganval R. (In) The Enzymes (3rd edition) 8:307-334(1973). 

[ 2] Gilles A.-M.. Presecan E., Vonica A., l^scu I. J. Biol. Chem. 266:8784-8789(1991). 

[ 3] Biggs J., Hersperger E.. Steeg PS., Uotta LA., Sheam A. Cell 63:933-940n 99Q) 

[1053] 366. Nitrite and sulfite reductases iron-sulfur/siroheme-binding site (NIR.SIR) Nitrite reductases (NiR) fl] 
catalyze the reduction of nitrite into ammonium, the second step in the assimilation of nitrate. There are two types of 
fsliR the higher plant chtoroplastic form of NIR (EC 1.7.7.1) is a monomeric protein that uses reduced ferredoxin as 
the electron donor; while fungal and bacterial NiR (EC 1.6.6.4) are homodimeric proteins that uses NAD{P)H as the 
electron donor. Both fomns of NiR contain a siroheme-Fe and iron-sulfur centers. Sulfite reductase (NADPH) (EC 
(SIR) 12] is the bacterial enzyme that catalyzes the reductkxi of sulfite to sulfide. SIR is an oligomeric enzyme 
with a subunit composition of alpha(8)-beta(4), the alpha component is a flavoprotein (SIR-FP). while the beta com- 
ponent IS a siroheme, iron-sulfurprotein (SIR-HP).Sulfite reductase (ferredoxin) (EC 1.8.7.1 ) (3) is a cyanobacterial 
and plant monomark: enzyme that also catalyzes the reduction of sulfite to sulfide. Anaerobic sulfite reductase (EC 
1.8.1.-) (ASR) [4). a bacterial enzyme that catalyzes the NADH -dependent reduction of sulfite to sulfide ASR is an 
oligomeric enzyme composed of three different subunits. The C component (geneasrC) seems to be a siroheme iron- 
sulfur protein. These enzymes share a region of sequence similarity in their C-terminal half; this region whch spans 
about 80 ammo acids includes four consented cysteine rescues. Two of the Cys are grouped together at the beginning 
of the domain, and the two others are grouped in the mkJdle of the domain. The cysteines are involved in the binding 
of the iron-sulfur center; the last one also binds the siroheme group [2]. A signature pattern from the regbn around the 
second cluster of cysteines was derived. 

[1 054] Consensus pattem: lSTV)-G-C.x(3)-C-x(6HDEHLI VMF]-|G AT]-ILI VMF] [The two C*s are ison-sulf ur ligandsj- 
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- Escherichia coli hypothetical protein ygdU and HI0901 , the corresponding Haemophilus influenzae protein. 

- Escherichia coli hypothetical protein yjaD and HI0432, the corresponding Haemophilus influenzae protein, 
Escherichia coli hypothetical protein yrfE. 

Bacillus subtills hypothetical protein yqtcG. 
Bacillus subtilis hypothetical protein yzgO. 
Yeast hypothetical protein YGL067w. 

[1 040] It is proposed [2] that the consented domain could be Involved in the active center of a family of pyrophosphate- 
releasing NTPases. As a signature pattern the core region of the domain was selected; it contains four conserved 
glutamate residues. 

[1041J Consensus pattem: G-x(5)-E-x(4)-ISTAGCMLIVMAC]-x-R-E-[UVMFr]-x-E-E- 

[1] Michaels M.L. Miller J.H. J. BacterbL 174:6321-6325(1992). 
[2] Koonin E.V. Nucleic Acids Res. 21:4847-4847(1993). 

(3) Mejean V., Salles C, Bullions M.J.. Bessman M.J., Claverys J. -P. Mol. Microbiol. 11:323-330(1994). 

[4] Sakumi K., FuruichI M.. Tsuzuki T. Kakuma T. Kawabata S.. f^ki H.. Sekiguchi M. J. Bk)l Chem 268- 

23524-23530(1993). 

(5]Thorne N.M.H., Hankin S., Wilkinson M.C., Nunez C, Barractough R„ McLennan A.G. Biochem. J 31T717-721 
(1995). 

[1042] 361 . Myb DNA-binding domain repeat signatures 

The retroviral oncogene v-myb . and its cellular counterpart c-myb. encodenuclear DNA-binding proteins that specifi- 
cally recognize the sequence YAAC(G/T)G [1). The myb family also Includes the foltowing proteins: - Drosophila D- 
myb [2]. - Vertebrate myb-like proteins A-myb and B-myb (3). - Maize CI protein, a trans-acting factor which controls 
the expression of genes involved in anthocyanin bk)synthesis. - Maize P protein [41. a trans-acting factor which regulates 
the bk>synthetic pathway of a flavonokl-derived pigment in certain ftoral tissues. - ArabkJopsis thaliana protein GL1 [5], 
required for the initiation of differentiation of leaf hair cells (trichomes), - A number of myb/cl-related proteins in maize 
and barley, whose roles are not yet known [4]. - Yeast BAS1 [7], a transcriptional activator for the HIS4 gene. - Yeast 
REB1 [8], which recognizes sites within tx>th the enhancer and the promoter of rRNA transcription, as well as upstream 
of many genes transcribed by RNA polymerase II. - Fission yeast cdc5, a possible transcription factor whose activity 
is required for cell cycle progressfon and growth during G2. - Fission yeast mybl. which regulates tetomere length and 
functbn. - Yeast hypothetical protein YMR21 3w.One of the most conserved regtons in all of these proteins is a domain 
of leOamino acids. It consists of three tandem repeats of 51 to 53 amino acids. In myb. this repeat region has been 
shown (9] to be involved in DNA-binding. The major part of the first repeat is missing in retroviral v-myb sequences 
and in plant myb-related proteins. Yeast REB1 differs from the other proteins in this family in having a single myb-like 
domain. As shown in the foltowing schematic representatbn, two signature patterns for myb-like domains were devel- 
oped; the first is located in the N-terminal section, the second spans the C-terminal extremity of the domain. 

xxxxxxxxxWxxxEDxxxxxxxxxxxxxx WxxIxxxxxxRxxxxxxxxWxxxx 

[1043] Consensus pattem: W-[ST)-x(2)-E-[DE)-x(2)-(LIV]- 
Consensus pattem: W-x(2)-[LI]-[SAGJ-x(4,5)-R-x(8)-[YW]-x(3)-lLIVMl- 

Note: this pattern detects the three copies of the domain in myb. d-myb, A-myb and B-myb; the second of the two 
complete copies of plant myb-related proteins, and the last two copies of yeast BAS1 

[ 1J Biednkapp H.. Borgmeyer U., Sippel A.E.. Klempnauer K.-H. Nature 335:835-837(1988). 

[ 21 Peters C.W.B., Sippel AE., Vingron M.. Klempnauer K.-H. EMBO J. 6:3085-3090(1987). 

( 3) Nomura N.. Takahashi M., Matsui M., Ishii S., Date T. Sasamoto S., Ishizaki R. Nucleic Acids Res 16" 

11075-11090(1988). 

[ 4] Grotewold E.. Athma R, Peterson T. Proc. Natl. Acad. Sci. U.S.A. 88:4587-4591(1991). 

[ 5) Oppenheimer D.G.. Herman PL., Sivakumaran S., Esch J.. Marks M.D. Cell 67:483-493(1 991 V 

[6)MaroccoA.,WissenbachM.. BeckerD..Paz-AresJ..SaedlerH., SalaminiR. RohdeW Mol Gen Genet 216' 
183-187(1989). 

I 7) Tice-Baldwin K.. Fink G.R.. Amdt K.T. Science 246:931-935(1989). 
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[1037J 359. Proka/yotic molytxfcjpterin oxicioreductases signatures (molybdopterin) 

A number of different proloryolic oxidoreductases that require and bind amolybdopterin cofactor have been shown 
11.2.3) to share a number of regions of sequence similarity. These enzymes are: - Escherichia coli respiratory nitrate 
reductase (EC 1.7.99.4). This enzyme complex altows the bacteria to use nitrate as an electron acceptor during anaer- 
obic growth. The enzyme is composed of three different chains: alpha, beta and gamma. The alpha chain (gene naiG) 
rs the molybdoptenn-binding subuniL Escherichia coli encodes for a second, closely related, nitrate reductase complex 
which also contains a molybdoptenn-binding alpha chain (gene narZ). - Escherichia coli anaerobic dimethyl sulfoxide 
reductase (DMSO reductase). DMSO reductase is the tenninal reductase during anaerobic growth on various sulfoxide 
and N-oxide compounds. DMSO reductase is composed of three chains: A. B and C. The A chain (gene dmsA) binds 
molybdopterin. - Escherichia coli biotin sulfoxide reductases (genes bisC and bisZ). This enzyme reduces a sponta- 
neous oxidation product of biotin. BDS. back to bfotin. It may senre as a scavengei; allowing the cell to use biotin 
sutfratide as a biotin source. - Methanobacterhim fomtiicicum lomiate dehydrogenase (EC 1 2 1 2) The alpha chain 
(gene fdhA) of this dimeric enzyme binds a molybdopterin cofactor. - Escherichia coli formate dehydrogenases -H 

enzymes are responstole for the oxidation of formate tocartxm 
dioxide. In addition to molybdopterin. the alpha (catalytic) subunit also contains an active site, selenocysteine - Wb- 
linellasuccinogenes polysulfide reductase chain. This enzyme isacomponent of the phosphorylative election transport 
system with polysulfide as the terminal acceptor. It is composed of three chafris: A. B and C. The A chain (gene psrA) 
binds molybdopterin. - Salmonella typhimurium thnsuHate reductase (gene phsA). - Escherichia coli trimethylamine- 
Noxide reductase (EC VSJJ) (gene torA) [4). - Nitrate reductase (EC 1.7.99.4i from Klebsiella pneumoniae (gene 
nasA). Alcaligenes eutrophus. Escherichia coli. Rhodobacter sphaeroides. Thiosphaera panlotropha (gene napA) and 
Synechococcus PCC 7942 (gene narB).These proteins range from 715 amino acids (fdhF) to 1246 amino ackte (riarZ) 
insize. Three signature patterns for these enzymes were derived. The first is based on a consenred regkm in the N- 
terminal section and contains two cysteine residues perhaps involved in binding the molybdoptern cofactor. It should 
be noted that this region is not present in bisC. The second pattern is derived from a consented region tocated in the 
central part of these enzymes. 

Cmm- f^"®""- lSTANhx-tCHhx(2.3)-C-[STAGJ-[GSTVMF]-x-C-x-{UVMFYVVhx-lU\«»/IAhx(3.4^^^ 

Consensus pattern: [STA>x-(STACJ(2)-x(2HSTAl-D-IUVH/lY](2)-L-P-x-{STACJ(2)-X(2)-E- 

Consensus pattern: A-x(3)-{GDT>l-x-lDNQTiq-x-[DEAl-x-IUVMJ-x-ILIVMCJ-x- [NShx(2)-{GShx(5)-A-x-IUVI»/IHSTl- 

si^M??^-^"^jS!r ^ ^^^'^ ^"^^ ^ ^^"^^ 

1 2J Bitous P.T. Cole S T. Anderson W.F.. Weiner J.H. Mot. Microbiol. 2:785-795(1988). 
( 3} Trieber C.A.. Rothery R.A., Weiner J.H. J. Biol. Chem. 269:7103-7109(1994) 

( 4J Mejean V.. Lobbi-Nivol C. Lepelletier M.. Gk>rdano G.. Chippaux M.. Pascal M..C. Mol. Microbfol. 11:1169-1179 
[10391 360. Bacterial mutT domain signature 

The bacterial mutT protein is involved in the GO system [IJ responsible for removing an oxidatively damaged form of 
guanine (8-hydroxyguanine or7,8-dihydroe-oxoguanine) from DNA and the nucleotide pool SoxodGTP is inserted 
opposite to dA and dC residues of template DNA with almost equal efficiency thus leading to A.Tto Q.C transversions 
MutT specifK:ally degrades Soxo^TP to-the monophosphate with the concomitant release of pyrophosphate MutT 
IS a small prole», of about 12 to 15 Kd. It has been shown 12.3J that a region of about 40 amino acid residues which 
Bi^r^roteiTO artr *^ ^'^ ^ " ^ "^"^^ °' prokaryotic. viral, and eukaryotic proteins. 

Streptomyces pneumoniae mutX 
" A mutT homolog from plasmid pSAM2 of Streptomyces ambofaciens. 
Bartonella bacllllformis invasion protein A (gene invA). 

- Escherichia coli dATP pyrophosphohydrolase. 
Protein D250 from African swine fever viruses. 

- Proteins D9 and D10 from a variety of poxviruses. 

- Mammalian 7.8-dlhydro-8-oxoguanine triphosphatase (EC 3.1.6.-) [4]. 

- Mammalian diadenosine 5\5-.pi .P4.tetraphosphate asymmetrical hydrolase (Ap4Aase) (EC 3 6 11 7) 151 which 
cleaves A-5'-PPPP-5'A to yield AMP and ATR * " 

* Y^TqZv^^ antisense RNA of the basic fibroblast growth factor gene in higher vertebrates. 

Escherichia coli hypothetical protein yfaO. 
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1 1) Woessner J. Jr. FASEB J. 5:2145-2154(1991). 

[ 2] Sanchez-Lopez a, Nicholson R. Gesnel M.C.. Matrislan LM., Brealhnach R. J. Biol. Chem 263:11892-11899 
(1988). 

[ 3] Park A.J.. Matrisian LM., Kells A.R, Pearson a. Yuan Z. Navre M. J. Biol. Chem. 266:1584-1590(1991) 
[ 4] Lepage T. Gache C. EMBO J. 9:3003-3012(1 990). 

[ 5J KinoshitaT. FukuzawaK, ShimadaT. SaitoT, Matsuda Y. Proc. Natl. Acad. Sci. U.S.A. 89:4693-4697(1992). 
[1033] 357. Vertebrate metallothioneins signature (metafthio) 

Metallothioneins (MT) [1 ,2,3J are small proteins whk:h bind heavy metals such as zinc, copper, cadmium, nfckel» etc., 
through clusters of thbtate bonds. MPs occur throughout the animal kingdom and are also found in higher plants, fungi 
and some prokaryotes. On the basis of structural relattonships MT's have been subdivided into three classes. Class I 
includes mammalian MT's as well as MPs from crustacean and molluscs, but with clearly related primary stmcture. 
Class II groups together MPs from various species such as sea urchins, fungi, insects and cyanobacteria whbh display 
none or only very distant correspondence to class I MT*s. Class III MPs are atypical polypeptkJes containing gamma- 
glutamylcysteinyl units. Vertebrate class I MPs are proteins of 60 to 68 amino add reskJues, 20 of these reskJues are 
cysteines that bind to 7 bivalent metal ions. As a signature pattern a region that spans 1 9 residues and which contains 
seven of the metal-binding cysteines was chosen, this region is located in the N-temninal secton of class-l MPs 
[1 0341 Consensus pattern: C-x-C-[GSTAPl-x(2)-C-x-C-x(2)-C-x-C-x(2)-C-x-K- 

20 [ 1] Hamer D.H. Annu. Rev Biochera 55:913-951(1986). 

[ 2] Kagi J.H.R., Schaffer A. Biochemistry 27:8509-8515(1988). 
[ 3] Binz R-A. Thesis, 1996. Unrversrly of Zurfch. 
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[1035] 358. Mitochondrial energy transfer proteins signature (mlto^carr) 

Different types of substrate carrier proteins involved in energy transfer are found in the inner mitochondrial membrane 
(1 to 5). These are: - The ADP.ATP carrier prc^ein (AAC) (ADP/ATP transkxase) which exports ATP into the cytosol 
and imports ADP into the mitochondrial matrix. The sequence of AAC has been obtained from various mammalian, 
plant and fungal species. - The 2-oxoglutarate/malate carrier protein (OGCP), which exports 2K>xoglutarate into the 
cytosol and imports malate or other drcarboxyllc acids Into the mitochondrial matrix. This protein plays an Important 
role in several metabolic processes such as the malate/aspartate and the oxoglutarate/isocitrate shuttles. - The phos- 
phate carrier protein, which transports phosphate groups from the cytosol into the mitochondrial matrix. - The brown 
fat uncoupling protein (UCP) which dissipates oxidative energy into heat by transporting protons from the cytosol into 
the mitochondrial matrix. - The trlcarboxylate transport protein (or citrate transport protein) which is involved in citrate- 
H+/malate exchange. It is important for the bioenergetfcs of hepatic cells as it provides a carbon source for fatty acid 
and sterol biosyntheses, and NAD for the glycolytic pathway. - The Grave's disease carrier protein (GDC), a protein 
of unknown function recognized by IgG In patients with active Grave's disease. - Yeast mitochondrial proteins MRS3 
and MRS4. The exact function of these proteins is not known. They suppress a mitochondrial splice defect in the first 
intron of the COB gene and may act as carriers, exerting their suppressor activity by nrxxJulating solute concentratbns 
in the mitochondrion, - Yeast mitochondrial FAD carrier protein (gene FLX1). - Yeast protein ACR1 [6], which seems 
essential for acetyl-CoA synthetase activity. - Yeast protein PET8. - Yeast protein PMT - Yeast protein RIM2. - Yeast 
protein YHM1/SHM1. - Yeast protein YMC1. - Yeast protein YMC2. • Yeast hypothetical proteins YBR291C, YEL006w. 
YER053C. YFR045W, YHR002w, and YILOOSw. - Caenorhabdltis elegans hypothetical protein K11H3.aTwootherpro^ 
teins have been found to belong to this family, yet are not kxalized in the mitochondrial inner membrane: - Maize 
amytoplast Brittle-1 protein. This protein, found in the endosperm of kernels, could play a role' in amyloplast membrane 
transport. - CandkJa boidinii peroxisomal membrane protein PMP47 [71. PMP47 is an integral membrane protein of the 
peroxisome and it may play a role as a transporter. These proteins all seem to be evolutbnary related. Structurally, 
they consistof three tandem repeats of a domain of approximately one hundred residues. Each of these domains 
contains two transmembrane regions. As a signature pattern, one of the most conserved regions in the repeated domain 
was selected, kx^ated just after the first transmembrane regbn. 
[1036] Consensus pattern: P-x-lDE]-x-[LIVAT]-[RKl-x-ILRH]-[LIVMFYl-[QGAIVMl- 



[ 1] Klingenberg M. Trends Biochem. Sci. 15:108-112(1990). 
I 2) Walker J.E. Curr. Opin. Struct. BloL 2:519-526(1992). 
[ 31 Kuan J., Saler M.H. Jr. CRC Crit. Rev Biochem. 28:209-233(1993). 
55 [ 4] Kuan J., Saler M.H. Jr. Res. Microbk>l. 144:671-672(1993). 

[ 5) Nelson D.R., Lawson J.E.. Klingenberg M., Douglas M.G. J. Mol. Biol. 230:1159-1170(1993) 
[ 61 Palmieri F FEBS Lett. 346:48-54(1994). 

[ 7) Jank B.. Habermann B., Schweyen R.J., Link T.A. Trends Biochem. Sci. 18:427-428(1993). 
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1997;22:177-181. 

s t1iM2J [1] Medline. 98267202. Characterization o» gene repertoires at mature stage of citrus fruits tt,rouqh random 

^ mT^ T."^ °* melaflothionein^il« genes expressed durini^ruit developrint 

Kita M. Hisada S, EndoHnagaki T. Omura M; Gene 1998 211 221-227 '^"^m. iwiongucni i. 

[1023] 354. IMAGE femily 

10 ToTi. T.K ^"^'^""^ antigen-encoding gene) family are expressed in a wide variety of tumors but not in 

o^rrLtiro.rsTun^r^'^ 

S 1 £i^l"i? ^"^^3^*^ Udar N. Naiem F. Concannon P. Gatti RA; Mol Genet 

I1(«q 355. Italic enzymes signature. Malic enzymes, or malate oxidoreductases. catalyze t»ie oxidative decarbax 

H^ri!; ; ■ I'^^P^"^^' ««y^» (EC VLISS). which uses preferentially NAD and has^abiirto 
decarboxylate oxaloacetate (OAA). It is found in bacteria and insects. - NAIWependent malic enzwne (EC 1 TX^ 
wh«h uses pm^erentially NAD and is unable to decarboxylate OAA. It is found in the mitot^S JS^i 
andisaheterodimerolhighlyrelatedsubunits.-NADP<lependentmalicenzvmerECl 1 i ^^^JTiJf^ , 

Uiere are two .sozymes: one. mitochondrial and the other, cytosolic. Plants also h^e two isozym«^^3iS^id 
cytosolic. There are two other proteins which are closely structuiallv related to maii~n«m«^^ cnioroplastic and 

tfietical proten YKL029c. a probable malic enzyme. There are three well conserved regions in the enzvme seouen^ 
« Two of them seem to be involved in binding NAD or NADP. The significance of the third one 3^ 2Z ^^rf 

? M?lT:r: ^ V™^ "^^^ ^^^^'^"^ ^ « signaturlSem^oSJl^S:^ 

iS2 , f"^'"- '^-''-P^-^''(2)-g-t4gsa]-x-[ivj-x-(uvmahgast](Wmf](2)- 

[1028] 1 1J Artus N.N.. Edwards G.E. FEBS Lett 18a225-233<1985H21l^bar G i„f«..- a a m . 
Krysteic E.. Dworkin M.B. J. Bid. Chem 2663016-302inSl W ailrir, , ^ J ' • 

30 269:2827-2833(1994). 'BO-JUI 6-3021(1 991). [ 3] Long J.J.. Wang J.-L. Berry J.O. J. Biol. Chem. 

[1029] 356. (matrixin) 

Matrixins cysteine switch (aka peptidase_M10) 

<^(!^^^^ extracellular matr^ metalloproteinases (EC 3.4.24.-). also known as matrixins [1] (see 
35 fro^„ f ^' ^'"'=:^«P«"''«"» They are secreted by cells in an inactive fom, (zymoger^tha diE 

35 from the mature enzyme by the presence of an N-terminal propeptWe. A highly consented octX^! iJi^!l:H .^ 

residues downstream of the C-tem,«al end of the propepUde' TOs regioJ hS^ersSo" eSe^^^^^^^^^^^ 

mis region has been called the 'cysteine switch* or "autoinhibitor region' «'^y"iw. 
11031] A cysteine switch has been found in the tollowing zinc proteases: 

- MMP-I (EC 3.4.24.7) (interstitial collagenase) 

- MMP-2 (EC 3.4.24.24) (72 Kd gelatinase). 

- MMP-3 (EG 3.4.24.17) (stromelysin-1). 

- MMP-7 (EC 3.4.24.23) (matrilysin). 

- MMP-8 (EC 3.4.24.34) (neutrophil collagenase). 

- MMP.g (EC 3.4.24.35) (92 Kd gelatinase). 

• MMP.1 0 (EC 3.4.24.22) (stromelysin.2). 

- MMP-1 1 (EC 3.4.24.-) {stromelysin-3). 

- MMP-12 (EC 3.4.24.65) (macrophage metalloelastase) 

- MMP-1 3 (EC 3.4.24. ) (collagenase 3). 

- MMP 14 (EC 3.4.24.-) (membrane-type matrix metalliproleinase 1 ). 

- MMP-1 5 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 2). 

- MMP-16 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 3) 

• Sea urchin hatching enzyme (EC 3.4.24. 1 2) (envelysin) (4J. 

- Chlamydomonas reinhardtii gamete lytic enzyme (GLE) [5J. 
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[0994] 343. Mandelate racemase / muconato lactonlzing enzyme family signatures 

Mandelate racemase (EC 5.1.2.2) (MR) and muconate tectonlzing en2yme(EC 5.5.1. t > (MLE) are two bacterial en- 
zymes involved in aromatic acid catabolism. They catalyze mechanistically distinct reactions yet they are related at 
the level of their primary, quaternary (honruwctamer) and tertiary structures [1 ,2J.A number of other proteins also seem 
to be evolutionary related to these two enz^m&s. These are: - The various plasmid-encoded chloromuconate cyclers- 
omerases (EC 5.5.1.7). - Escherichia coli protein rspA [3], rspA seems to be involved in the degradation of homoserine 
lactone (HSL) or of one of rts metabolite. - Escherichia coli hypothetical protein ycjG. - Escherichia coli hypothetical 
protein yidU. - A hypothetical protein from Streptomyces ambofaciens [4]. Two signature patterns have been developed 
for these enzymes; both contain conserved acidic residues. 

10995] The second pattern contains an aspartate and a glutamate which are ligands for either a magnesium ion (in 
MR) or a manganese ion (InMLE). 

[0996] Consensus pattern: A-x-[SAGCN]HSAGl-(UVM]-lDEQJ-x-A.[LA]-x-[DEHUA]-x-fGA]-[KRQl-x(4)-rPSAl- 
[LIV]-x(2)-L-(UVMF]-G- ^ ^ 

Consensus pattern: [LIVFl-x(2)-D-x-iNHJ-x(7)-lACLl-x(6)-[LIVMF].x(7)-[LIVMl- E4DENQ]-P [D and E bind a divalent 
metal ion]- 

[ 1] Neidhart D.J., Kenyon G.L, Gerit J.A., Petsko G.A Nature 347:692-694(1990). 

[ 2J Petsko G.A, Kenyon G.L, GerIt J.A., Ringe D.. Kozarich J.W. Trends Biochem. Sci. 18:372-376(1993) 

[ 3] Huisman G. W.. Kolter R. Science 265:537-539(1 994). 

( 4) Schneider D.. Aigle B.. Lebtond R, Sinionet J.M.. Decaris B. J. Gen. Microbiol. 139:2559-2567(1993). 
[0997] 344. Merozoite Surface Antigen 2 (MSA-2) family 

[0998] Thomas AW, Carr DA. Carter JM, Lyon JA, Mol Biochem Parasitol 1990;43:211-220. 
[0999] 345. MSP (Major sperm protein) domain. 

[1000] Major sperm proteins are involved in sperm motility. These proteins oligomerise to form filaments. Partial 
matches to this domain are also found in other non MSP proteins. These include Swiss: P40075 and Swiss P34593 
£1001] [1] Bulkxsk TU Roberts TM, Stewart M, J Mol Bbl 1996;263:284-296. [2] King KL. Stewart M Roberts TM 
Seavy M. J Cell Sci 1 992;101 :847-e57. 

[1002] 346. (Matrix) Viral matrix protein. Found in Morbillivirus and paramyxovims. pneumovirus. Number of mem- 
bers: 105 

[1003] 347. Omethyltransf erase (methyttransf) 

[1 004] This family includes a range of Omethyltransf erases. These enzymes utilise S-adenosyl methtonine. 

[1 005] [1 ] Keller NP. Dischinger HC, Bhatnagar D. Cleveland TE, Ullah AH. AppI Environ Microbiol 1 993 59 479-484 

[1006] 348. Magnesium chelatase, subunit Chll 

[1007] Magnesium-chelatase is a three-component enzyme that catalyses the insertion of Mg2+ into protoporphyrin 
IX This is the first unique step in the synthesis of (bacterio)chlorophyll. Due to this, it is thought that Mg-chelatase has 
an important role in channeling inter- mediates into the (bacterio)chlorophyll branch in response to conditions suitable 
for photosynthetic growth. Chll and BchD have molecular weight between 38-42 kDa. 

[1008] [1 1 Walker CJ, Willows RD. Biochem J 1 997;327:321 -333. [2] Petersen BL. Jensen PE. Gibson LC. Stummann 
BM. Hunter CN, Henningsen KW. J Bacteriol 1998;180:699-704. 
[1009] 349. Plasmid recombination enzyme (Mob^Pre) 

[1010] With some plasmids, recombination can occur in a site specific manner that is independent of RecA. In such 

cases, the recombination event requires another protein called Pre. Pre is a plasmid recombination enzyme. This 

protein is: also known as Mob (conjugative mobilization). 

[1011] [1] Priebe SD. Lacks SA. J Baclerk>l 1989;171:4778-4784. 

[1012] 350. Monooxygenase 

[1013] This family includes diverse enzymes that utilise FAD. 

[1014] [1J Gatti DU Palfey BA. Lah MS. Entsch B. Massey V, Ballou OR Ludwig ML, Science 1994-266 H0-114 
[1015] 351. Mov34 family 

[1016] Members of this family are found in proteasome regulatory subunits. eukaryotic initiation factor 3 (elF3) sub- 
units and regulators of transcription factors. 

[1017] [1] Aravind L, Ponting CP. Protein Sci 1998;7:1250-1254. [2] Hershey JW. Asano K. Naranda T. Vornk)cher 
HP. Hanachi P. Merrick WC, Biochimie 1996;78:903-907. 
[1018] 352. Myc amino-terminal region (Myc_N_term) 

[1019] The myc family belongs to the basic helix-loop-helix leucine zipper class of transcriptton factors, see HLH 
Myc forms a heterodimer with Max. and this complex regulate;; cell growth through direct activation of genes involved 
in ceil replrcation [2J. 

[1020] (1) Facchini LM. Penn LZ. FASEB J 1998;12:633-651. [2] Grandori C, Eisenman RN. Trends Biochem Sci 



EP1 033405 A2 



-MCm.atotowwn as DNA polymerase alpha holoenzymMsscw RLFbetasubunllorROA. -mcma 

pombe). - MCM6. also known as mis5 (in S.pombe). - MCM7. also known as CDC47 or ProliferaiTA.^^ 
and MJECLiaihe presence ofa putative ATP-binding domain implies that these proteins ma^vTd 

r^ol ^**"^*'^'®^®"*"^«^^*'='°"°*'^«B motif lound in ATP-binding prote^^ «'fe9'on 
[09891 Consensus pattern: G-PVTHLVACJ(2HIVT>I>{DEHFLHDNST1 

1 1) Coxon A.. MaundreU K.. Kearsey S.E. Nuclec Acids Res. 2a5571-5577(1992) 

o t" \' l""^ ° - ^- '^'PP®'' Nucleic Acids Res. 21.5289-5293(1993) 

1 3] Tye B.-K. Trends Cell BioL 4:160-166(1994). =<:»Jlisaj>. 

1 4] Koonin E.V. Nucleic Acids Res. 21:2541-2547(1993). 
[0990J 341 . Macrophage migration inhibitory factor family signature (MIF) 

i^^? "^f^. "^"f^^f '^^'^^ inhibitory factor (MIF) [1 J seems to exert an important role In host inflammaton, 
responses^ I play a p«,otal role in the host response to endotoxic shock and appeaT^o senre as a SuLT^^^ 
^or«fr«t regulates systemic inflammatory responses. MIF « a secreted prcJein of 115 r^^L^SJ^. 
essed mxnalargerprecursor. D-dopachrometautomerase[2]isamamma 

bK>synthes« and that tautomerizes CXtopachrome with concomitant decarboxySS. to Q^^^^xSr^Tf,^ n 

It « a p^ein of 117 residues highV related to MIF. It must be noted that M^^bin* SS^fc^TSS^ 

be related to glutathbne S-t«nsferases. This assertion has been later disprovedT3]Ta CatTre Sem for meJ« 

proteins, a consented region was selected kx»ted in the central section ^ ^ 

[09911 Consensus pattern: [DE]-P-C-A-x(3)-[UVM]-x-S-l-G-x-[UVMJ-G- 

[ 1) Bucala R. Immunol. Lett. 43:23-26(1994). 

61^(1993^'""'' "^"^"^ "^^"^ "^"^ ^'V^- 197: 

[ 3] Pearson W.R. Protein Scl. 3:525-527(1994). 

[0992J 34^ MIP family signature 

Plasn^ membranes of red cells and kidney proximal and collecting tubules with high pLLS to^teT^^^^^^ 
pem,m,ng water to move in the direction of an osmotic gradient Soybean noduiti Tmaiir cl^^^^^^^ 

proiems (riP). There are various isoforms of TIP abha fseed^ namma Rt /r«^i\ %*# • * . •wHiaamiuinsic 

«codenmal cell to become an epidermoblast instead of a neuroblast. - Yeast hypothe^SrSL^nTn (iJT a .^^ 
thetical protein from the pepX reqion of lactococcus lactic Tho uuo » • P'°'«'" VFL054C. - A hypo- 

Si c^S is located in a probable cytoplasmic k«pVetween the slnd a^Z^ua^^Ta^^ 
[09931 Consensus pattern: (HNQAJ-x-N-P-ISTAHLIVMFh[STl-(UVMF)-lGSTAFY}- •'^"smembrane regions. 

( 1] Reizer J.. Reizer A.. Saier M.H. Jr. ORG Grit. Rev. Biochem. 28:235-257(1 993J 
( 2J Baker M. E. , Saier M. H. Jr. Cell60:185-186f1990) * 

Ll^^?^33-37;V91,^^ ' ""'"^ ^^'^^ ^ • • M.H. j. MoI. 

1 4J Wistow 6.J., Pisano M.M.. Chepelinsky A.B. Trends Biochem. ScL 16-170-171{199n 
I 5J Chnspeeb M. J., Agre P. Trends Biochem. Sci. 1 9:421 -425(1 994): 
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( 6) Flower D.R., North A.C.X Attwood TK. Protein Sci. 2:753-761(1993). 
[ 7J Flower D.R, FEBS Lett. 333:99-1 02(1 993). 

[0984] 338. Lipoxygenases iron-binding region signatures 

Lipoxygenases (EC 1.13.11.-) are a class of iron^xxitaining dioxygenases which catalyzes the hydroperoxidation of 
lipids, containing a cis.cis-1 .4-pentadiene structure. They are common In plants where they may be involved in a number 
of diverse aspects of plant physiology including growth and development, pest resistance, and senescence or respons- 
es to wounding {1 ). In mammals a number of lipoxygenases isozymes are involved in the metabolism of prostaglandins 
and leukotrienes (2J, Sequence data is available for the following lipoxygenases: - Plant lipoxygenases (ECi13lll2) 
Plants express a variety of cytosolic isozymes as well as what seems [3] to be a chloroplast isozyme - Mammalian 
arachidonate 5-lipoxygenase (EC 1.13.11.34) . - Mammalian arachidonate 1 2-llpoxygenase (EC 1.13 11 31 ) - Mam- 
malian erythroid cell-specific 1 5-lipoxygenase (EC 1.1311.33 ).The iron atom in lipoxygenases is bound by four ligands 
three of which are histidine residues (4J. Six histidines are conserved in all lipoxygenase sequences, five of them are 
found clustered in a stretch of 40 amino acids. This region contains two of the three zinc-ligands* the other histidines 
have been shown (5) to be important for the activity of lipoxygenases. As signatures for this family of enzymes two 
patterns in the region of the histidine cluster were selected. The first pattern contains the first three consented histidines 
and the second pattern includes the fourth and the fifth. 

Consensus pattern: H-{EQ].x(3)-H-x-[LM]-[NQRC]-[GST].H-[LIVMSTAC](3)-E [The second and third H's bind iron)- 
[0985] Consensus pattem: IUVMA]-H-P-[UVM)-x-[KRQ]-[LIVMF](2)-x-[AP]-H- 

[ 1] Vick B.A., Zimmerman D.C. (In) Biochemistry of plants: A comprehensive treatise, Stumpf RK Ed Vol 9 

pp.53-90. Academic Press. New-York. (1 987). 

[2] IMeedleman P. Turk J.. Jakschik B.A.. Morrison A.R.. Lefkowlth J.B. Annu. Rev. Biochem. 55:69-102(1986) 
[ 3) Peng Y.L. Shirano Y., Ohta H., Hibino T, Tanaka K.. Shibata D. J. Biol. Chem. 269:3755-3761(1994) 
[ 4) Boyjngton J.C., Gaffney B.J.. Amzel LM. Science 260:1482-1486(1993). 

[ 5] Steczko J.. Donoho G.R, Clemens J.C.. Dixon J.E.. Axelrod B. Biochemistry 31:4053^57(1992). 
[0d86] 339. Fumarate lyases signature (lyase.l ) 

A number of enzymes, belonging to the lyase class, for which fumarate is a substrate have been shown (1 2] to share 
a short consented sequence around a methionine which Is probably Involved in the catalytic activity of this type of 
enzymes. These enzymes are: - Fumarase (EC 4.2.1. 2) (fumarate hydratase). which catalyzes the reversible hydration 
of fumarate to L-malate. There seem to be 2 classes of f umarases: class I are thermolabile dimeric enzymes (as for 
example: Escherichia coli fumC); class II enzymes are thermostable and tetrameric and are found in prokaryotes (as 
for example: Escherichia coli fumA and f umB) as well as in eukaryotes. The sequence of the two classes of f umarases 
are not closely related. - Aspartate ammonla-lyase (EC 4.31.1 ) (aspartase). which catalyzes the reversible conversion 
of aspartate to fumarate and ammonia. This reaction is analogous to that catalyzed by fumarase. except that ammonia 
rather than water is involved in the trans-elimination reaction. - Arglnosucclnase (EC 4.3.2.1) (argininosuccinate Vase) 
which catalyzes the fomr^tion of arginine and fumarate from argininosuccinate. the last step in the biosynthesis of 
arginrne. - Adenytosuccinase (EC 4.3.2.2) (adenytosuccinate lyase) [3] ■ whfch catalyzes the eight step in the de novo 
biosynthesis of purines, the fomiation of 5'-phosphoribosyl-5-amino-4-imidazolecarboxamlde and fumarate from 1- 
(5-phosphoribosyl).4.(N-succino-carboxamide). That enzyme can also catalyzes the fonnatlon of fumarate and AMP 
from adenytosuccinate. - Pseudomonas putlda 3<arboxy-cis.cis-muconate cycloisomerase (EC 5512) (3-carboxy- 
muconate lactonizing enzyme) (gene pcaB) [4], an enzyme involved in aromatb acids catabolisn^ 
[0987] Consensus panem: G-S-x(2)-M-x(2)-K-x-N- 

[ 1J Woods S.A.. Shwartzbach S.D.. Guest J.R. Biochim. Biophys. Acta 954:14-26(1988). 
[ 2) Woods S.A., Miles J.S., Guest J.R. FEMS Microbiol. Lett. 51:181-186(1988). 
[ 3) Zaikin H., Dixon J.E. Prog. Nucleic Acid Res. Mol. Biol. 42:259-287(1992). 

976^9776(199^^^^ Woolridge E.M.. Ransom S.C., Undro J.A.. Babbitt RC, Kozarich J.W. Btochemistry 31: 
[0988] 340. MCM family signature and profile 

Proteins shown to be required for the initiation of eukaryotic DNA replication share a highly consen/ed domain of about 
210 amino-acid residues [1.2.3]. The latter shows some similarities (4J with that of various other families of DNA- 
dependent ATPases. Eukaryotes seem to possess a family of six proteins that contain this domain They were first 
Identified in yeast where most of them have a direct role in the initlatton of chromosomal DNA replication by interacting 
directly with autonomously replicating sequences (ARS). They were thus called •minlchromosome maintenance pro- 
teins with gene symbols prefixed by MCM. These six proteins are: - MCM2. also known as cdc19 (in S pombe) [El] 
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Ka> "^'"^ '°^'^®'-^-"'^'*^STARiq-x(0.2HDENQARK]4UVFYKCP}^lCh W-(FYWLRHhx- 

Note: it is suggested, on the basis of similarities oJ structure, function, and sequence, that this family forms an overall 

1 11 Cowan S.W, Newcomer M.E, Jones TA Proteins 8:44-61 (1990) 

[ 2J Igaraishi M.. Nagata K Toh H.. Urade H.. Hayaishi N. Proc. Natl Acad. Sci. U S A. 89 5376.5380fl992^ 

[ 3) Flower D.R.. North /^CX, Attwood TK. Protein Sci. 2:753-761(1993) 89 5376-5380(1 992). 

1 4) Godovac-Zimmermann J. Trends Biochem. Scl 13:64-66(1988) 

[ 5] Pervaiz S.. Brew K. FASEB J. 1 :209-214(1 987). 

( 6) Kremer J.M.H.. Wilting J., Janssen LH.M. Pharmacol. Rev. 40 1-47(1989) 

1 7) Haefliger J.-A., Pertsch M.C., Jenne D., Tschopp J. Mol. Immunol. 28:123-131(1991) 

Q ^^Zi'^'' ^^^^""^ Z^Salsky RF., Findlay J.B.C. Eur. J. Biochem! 197:407-417(1991) 

[ 9J Newcomer M.E. Structure 1:7-18(1993). «''.-H//-*n/^iyyi;. 

(10] Collet C. Joseph R Biochim. Biophys. Acta 1167:219-222(1993) 

fS! ^lltrJ" r'tl''".''^ I" - ^'^^ N. J. BIOL Chem. 268:10425-10432(1993). 

[12] Pertsch M.C., Boguski M.S. Trends Biochem. Sci. 16:363-363(1 991 ) 

I13J Miyawald A., IMatsushita Y.Pi. Ryo Y, lUlikoshtba T. EMBO J. 13:5835-5842(1994) 
[UJKockK., AhlersC. Schmale H. Eur. J. Biochem. 221-905-916(1994) 

y3'l7?Sm(f99?)""^ "^^^ J- ^ Chem. 267: 

I16J Morel L, Dufarre J.-P.. Depeiges A. J. BoL Chem. 268:10274-10281(1993) 

S n o -i^^ '^"^ S'°Phy«- R es. Commun. 180:69-74(1991) ' ^ 

[191 Fbwer D.R FEBS Lett. 333:99-102(1993). 

[0982] Cytosollc fatty-acid binding proteins signature (Iip2) 

n T^fJl'^ P'^'^"' *^ ^ anions are present In the cytosol 

tl;^,r^ ? are suucturally related and have prot^bly diverged fKxn a «xnmon ancestSTs ^ u^xZTI 

ten strandedantiparallel beta-barrel, albeit witha wide discontinuity between the fourth andm^^^ 

. 1 topology enclosing an internal ligand binding site |2.7J. Proteins known to belongto tWs SS;^ lii fiST 

H^r, pkXr , ^ "'""'^'"^ <''^^^^> •leart.'epidl^LdSS^" bS^etha' 

Heart FABP » also known as mammaryKJerived growth inhibitor (MDGI). a protein that remsibSS^ n^^^S^ 
of mammaor carcinoma cells. Epklermal FABP is also known as psoriasis-assa^taS F^P^?! 7„,t? ^'^^^'T^ 

Mye n PZ protein, which may be a lipid transport protein in Schwann cells P2 is associated wHh tL . 
myelin. - Schistosoma mansoni protein Sm14 [5] which seems to be hv^L ^L^SS^t^alS Tl 

P9M,^ Consensus pattern: [GSAIVK)-x-[FYV^-x-IUVMF]-x(4)-CNHGHFY]-(DEhx-lUVMFYHUVMJ-x{2)-[LIV- 

Note: it is suggested, on the basis of similarities of structure function and «i«fiii/»r.r»*. tK^t * . 

1 1J Bemier I., Jolles P. Bk)chimie 69:1127-1152(1987). 

1 2] Veerkamp J.H.. Peelers R.A.. Maatman R.G.H.J. Biochim. Biophys. Acta 1081-1-24(1991) 

[ 3ISjegenmaler G.. Hotz R. Chatellard<3ruaz D.. DidieneanL.. Hellman U., Saurat J.-H. kiochem. J. 302:363-371 

! 2 tl^H n'^T^^li?^' » ^' E., Suzuki T.C., Welb M.A. J. Bfol. Chem. 267:380-384(1992) 

1 5] Moser D.. Tendler M.. Griffiths G.. Klinkert M.-a J. Biol. Chem. 266:8447-8454(19917 
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dues]- 

[ 1] McAlister-Henn L Trends Biochem. Scl. 13:178-181(1988). 

[ 2] Giett C. Biochim. Biophys. Acta 1100:217-234(1992). 

{ 31 Birktoft J.J^ Rhodes G., Banaszak LJ. Bk)chemistry 28:6065^1(1989). 

[ 4] Cendrin F.» Chroboczek J., Zaccai G.. Eisenberg H.. Mevarech M, Bk)chemistry 32:4308-4313(1993). 
[0972] 334. Legume lectins signatures 

Leguminous plants synthesize sugar-binding proteins which are called legume lectins [1 ,2]. These lectins are generally 
found in the seeds. The exact function of legume lectins is not known but'they may be irrvolved in the attachment of 
nitrogen-fixing bacteria to legumes and in the protection against pathogens. Legume lectins bind cateium and manga- 
nese (or other transitbn metals). Legume lectins are synthesized as precursor proteins of about 230 to 260 amino acid 
residues. Some legume lectins are proteolytically processed to produce two chains: beta (which corresponds to the 
N-terminal) and alpha (C-lerminal).The lectin concanavalin A (conA) from jack bean is exceptional in that the two chains 
are transposed and ligaled (by formation of a new peptide bond). The N-terminus ol mature conA thus corresponds 
to that of the alpha chain and the C-terminus to the beta chah. Two signature patterns specific to legume lectins have 
been devebped: the first is located in the C-terminal sectbn of the beta chain and contains a conserved aspartic acid 
residue important for the binding of calcium and manganese; the second one is located in the N-terminal of the alpha 
chain. 

[0973] Consensus pattern: [LIV]-[STAG]-V-[DEQV)-[FU]-D-[ST] [D binds manganese and cateium]- 
Consensus pattern: [LIV]-x-[ECX3]-[FYWKRJ-V-x-[LlVFhG-[LFJ-[STl- 

[ 1] Sharon N.. Lis H. FASEB J. 4:3198-320(1990). 

[ 21 Lis H.. Sharon N. Annu. Rev. Biochem. 55:33-37(1986). 

[0974] 335. CoA-ligases (ligases- CoA) 

[0975] This family includes the GoA ligases Succinyl-CoA synthetase alpha: and beta chains, malate CoA ligase and 

ATP-citrate lyase. Some members of the family utilise ATP others use GTP. 

[0976] [1) Wolodko WT. Fraser ME, James MN. Bridger WA. J Biol Chem 1994;269:10883-10890. 

[0977] 336. linker histone HI and H5 family 

[0978] Linker histone HI is an essential component ol chromatin structure. HI links nucleosomes into higher order 
structures Histone HI is replaced by histone H5 in some cell types. 

[0979] [1] Ramakrishnan V, Finch JT. Graziano V. Lee PL, Sweet RM. Nature 1993;362:219-223. 
[0980] 337. Lipocalrn signature (lipl) 

Proteins whrch transport small hydrophobic molecules such as steroids, bilins. retinoids, and lipids share limited regions 
of sequence homology and a common tertiary structure architecture (1 to 5]. This is an eight stranded antiparallel beta- 
barrel with a repeated + 1 topology enclosing a internal ligand binding site [1,3]. The name 'lipocalin' has been proposed 
[5] for this protein family. Proteins known to belong to this family are listed below (references are only provided for 
recently determined sequences). - Atpha-1 -microglobulin (protein HC), which seems to bind porphyria - Alpha-1-acid 
glycoprotein (orosomucoid), which can bind a remarkable array of natural and synthetic compounds [6]. - Aphrodisin 
whk:h, in hamsters, functions as an aphrodisiac pheromone. - Apolipoprolein D, which probably binds heme-related 
compounds. - Beta-lactoglobulin, a milk protein whose physiobgical function appears to bind retinol. - Complement 
component C8 gamma chain, which seems to bind retinol (7). - Crustacyanin [8], a protein from lobster carapace, whfch 
binds astaxanthin, a carotenoid. - EpkJidymal-retinoic acid binding protein (E-RABP) [9] involved in sperm maturatk)n. 
- Insectacyanin. a moth bilin-binding protein, and a related butterfly bilin- binding protein (BBP). - Late Lactation protein 
(LALP). a milk protein from tammar wallaby (10). - Neutrophil gelatiriase-associated lipocalin (NGAL) (p25) {SV-40 
induced 24p3 protein) [11]. • Odorant-binding protein (OBP), which binds odorants. - Plasma retinol-binding proteins 
(PRBP). - Human pregnancy-associated endometrial alpha-2 gtabulin. - Probasin (PB), a rat prostata protein. - Pros- 
taglandin D synthase (EC 5.399.2 ) (GSH-independent PGD synthetase), a lipocalin with enzymatic activity [12]. - 
Purpurin. a retinal protein which binds retinol and heparin. - Quiescence specific protein p20K from chicken (embryo 
CH21 protein). - Rodent urinary proteins (alpha-2-microglobulin), which may bind pheromones. - VNSP 1 and 2. putative 
pheromone transport proteins from mouse vomeronasal organ [13]. - Von Ebner's gland protein (VEGP) [14] (also 
called tear lipocalin), a mammalian protein which may be involved in taste recognition. - A frog olfactory protein, which 
may transport odorants. - A protein found in the cerebrospinal fluid of the toad Bufo Marinus with a supposed function 
similar to transthyretin in transport across the bkxxi brain barrier [15]. - Lizard's epbidymal secretory protein IV (LESP 
IV), which could transport small hydrophobic molecules into the epididymal fluid during sperm maturation |1 6]. - Prokary- 
otic outer-membrane protein bte [17].The sequences of irost members of the family, the core or kemal lipocalins, are 
characterized by three short consented stretches of residues l3,18].Others, the outlier lipocalin group, share only one 
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[09611 330 (Lum binding) Riboflavin synthase alpha chain family Lum4)inding site signature The fono»rinn omtoin^ 
have been Shown (1 .2) to be structurally and evolutionary related: - Riboflavin synthase^ ctei^ 17 
ribC in Escherichia coli. ribB in Bacillus subtilis and Photobacterium lek^^S^tfSf^ (RS^tpha) (gene 

ribo„avin,ro,ntwornolesot6.7-di,ne.hy«-,r-D.ra,5S^^^^^ 

^^^tT^TJ^"^- "'^ ''^""^ °' ^'"'^^ ^''W^' t° higher energy SZZ^ 

vravelength). LwnP binds non-covalentty to 6.7-dimethvl^-/i •-D-ribitvn luma^inA wh,^ fi^^> i '^',1 , isnoner 

proteh (YFP) (gene luxY). Like LumR YFP modub.es liremi^St^a C^^STnT^pS^ 

site tor Lum.RS^Ipha which binds two molecules of Lum has two perfect copies of this mottf whnL i ..JIp^J^m 
one mole«.te of Lum. has a Glu instead of Lys/Arg : the flrst posiSon o, i^^^^^^^'^'^t^^ YF? 

SSut^JlT ''^'^ '° ^ ^''^'^ dystuncional b^lng sHe^^sub^SJeS 

""i^^ P°^'°"°* «>Py °' the motif. Our signature pattern includes the Lum*iSing mot» 

[0962] Consensus pattern: IUVMF>x(5H3-ISTADNQHKREQIYWhV-N4UVMl-E "™«»"9"»M- 

i II^I^*^'!- Lee J.. PrasherD.C. Proc. Natl Acad. Sci. U.S.A. 88 1100-1104(19911 

[ 2] O-Kane D.J.. Prasher D.C. MoL Microbiol. 6:443-449(1992). «- i iw no4(799i). 

[0963] 331 . Lysyl oxidase putative copper-binding region signature i 

[0964] Consensus pattern: W-E-W-H-S-C-H-Q-H-Y-H 

l^fS ^ '^'^'^ • '^'"^'^ S.A. Biochim. Biophys. Acta 1202:7-12(1993) 
[0966] 332. Metallo-beta-lactamase superfamily (lactamase_B) 

[0967] m:NeuwaldAF. Liu JS,LipmanDJ, Lawrence CE. Nucleic Acids Res 1997 25 1665-1677 tSlCarfiA p»,-. 
S^uee E. Galleni M. Duez C. Frere JM. DidebergO. EMBO J 1995;14:4914^921 ' ' * 

[0968] 333. L-lactate dehydrogenase active site (Idh1) 

(LDH-C). found only m the spermatozoa ol mammals and birds In birds and crocodilian i<.n.-r rTi^o 

a structural protein and is known as epsilon.« p). L-2-hydro^:^ri te'C's^^^^^^^^^ , ^LT^nm 

[3J catalyzes the reversible and stereospecific interconversion betl^^T^^ ^^^l^- ^ ."^^ 

boxylic acids. L-hicDH is evolutfonary riated to £?s ra^i^TrefJ" ^ ^ L-2-hydroxy-car- 

a consented histidine which i, essenTial to me ^taljlic mJt^ " ''"^ "^'^'^ 

[0969] Consensus pattern: [UVMA]-G-[EQJ-H-G-[DNMST] (H is the active site residue] - 

! J! ^'^I*?.^ ' ^'"""^ ^ • Rossmann MG. J. Mol. Biol. 198-445^7(19871 

Lir8s"7" ::7lT8S^^^^ '"^^ ^'"^^ ' - - -^".^P- ^tt Acad. Sci. 

[ 3] Lerch H.-R, Frank R., Collins J. Gene 83:263-270(1989). 

[0970] Malate dehydrogenase active site signature {Idh2) 

prokaryotfc organisms contains a single loan of MDH in euL^SiTcSr^^rt^i^^^^ " Wh«e 
in the mitochondrial matrix and the other in the cyt5.Z F^^^lnTpSs a L^i^fTr " "^'^ 

functus in |he gl^bte pathway. In p^ts chCas. there ^' ^ad^rnatrDXe a^.^^^^^^ 
LL1M> wheh « essential for both the universal C3 photosynthesis (Calvin) cycle and the more «,SizidC4™r^ 

[0971] c^sensus pa„em: l-^^-T-nnK^m^.^Z^ZlZ^^^^^ 
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(forrecent listings see [1,2,3^: - Major outer membrane lipoprotein (murein -lipoproteins) (gene Ipp). - Escherichia coli 
lipoproleln-28 (gene nlpA). - Escherichia coli lipoprotein-34 (gene nIpB). - Escherichia coli lipoprotein nIpC. - Es- 
cherichia coll lipoprotein nlpD. - Escherichia coli osmotically inducible lipoprotein B (gene osmB). - Escherichia coli 
osmotlcally inducible lipoprotein E (gene osmE). - Escherichia coli peptidoglycan-associated lipoprotein (gene pal). - 
Escherichia coli rare lipoproteins A and B (genes rplA and rplB). - Escherichia coli copper homeostasis protein cutF 
(or nIpE). - Escherichia coli plasmids traT proteins. - Escherichia coll Col plasmids lysis proteins. - A number of Bacillus 
beta-lactamases. - Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). - Borrelia burgdorferi outer 
surface proteins A and B (genes ospA and ospB). - Borrelia hermsii variable major protein 21 (gene vmp21) and 7 
(gene vmp7). - Chlamydia trachomatis outer membrane protein 3 (gene omp3). - Fibrobacter sucdnogenes endoglu- 
canase ceI-3. - Haemophilus influenzae proteins Pal and Pep. - Klebsiella pullulunase (gene pulA). - Klebsiella pullu- 
lunase secretion protein pulS. - Mycoplasma hyorhinis protein p37. - Mycoplasma hyorhinis variant surface antigens 
A. B, and C (genes vIpABC). - Neisseria outer membrane protein H.8. - Pseudomonas aeruginosa lipopeptide (gene 
ippL). - Pseudomonas solanacearum endoglucanase egl. - Rhodopseudomonas viridis reaction center cytochrome 
subunit (gene cytC). - Rickettsia 17 Kd antigen, - Shigella flexneri invasion plasmid proteins mxU and mxiM. - Strep- 
tococcus pneumoniae oligopeptide transport protein A (gene amiA). - Treponema pallidium 34 Kd antigen. - Treponema 
pallidum membrane protein A (gene tmpA). - Vibrio harveyi chitobiase (gene chb). - Yersinia vimlence plasmid protein 
yscJ. - Halocyanin from Natrobacterium pharaon is [4], a membrane associated copper- binding protein. This Is the 
first archaebacterial protein known to be modified In such a f ashion).From the precursor sequences of alt these proteins, 
a consensus pattern and a set of rules to Identify this type of post4ranslational modificatk)n was derived. 
[0957] Consensus pattern: {DERK}(6)-IUVMFWSTAG1(2)-[UVMFYSTAGCQ]-[AGS1-C [C is the lipid attachment 
site J Additional mles: 1 ) The cysteine must be between positions 1 6 and 35 of the sequence in conskJeraton. 2) There 
must be at least one Lys or one Arg in the first seven positk)ns of the sequence. 

[ 1] Hayashi S.. Wu H.C. J. Bioenerg. Biomembr. 22:451-471(1990). 
( 2] Klein P, Somorjai RL. t-au PCX Protein Eng. 2:15-20(1988). 
[ 3] von Heijne G. Protein Eng. 2:531-534(1989). 

[ 4] Mattar S.. Scharf B., Kent S.B.H.. Rodewald K.. Oesterhelt D.. Engelhard M. J. Biol. Chem. 2691 4939-1 4945 
(1994). 

[0958] 329. (Lopoprotein 5) Prokaryotb membrane lipoprotein lipid attachment site. In prokaryotes, membrane lipo- 
proteins are synthesized with a precursor signal peptide, which Is cleaved by a specific lipoprotein signal peptkiase 
(signal peptidase 11). The peptkiase recognizes a conserved sequence and cuts upstream of a cysteine resklue to 
whk:h a glyceride-fatty ackJ lipid is attached [IJ.Some of the proteins known to undergo such processing currently 
include (for recent listings see [1,2,3]): - Major outer membrane lipoprotein (murein-lipoproteins) (gene Ipp). - Es- 
cherichia coli lipoprotein-28 (gene nlpA). - Escherichia coli lipoprotein-34 (gene nIpB). - Escherichia coli lipoprotein 
nIpC. - Escherichia coli lipoprotein nIpD. - Escherchia coli osmotically inducible lipoprotein B (gene osmB). - Escherfchia 
coli osmotically inducible lipoprotein E (gene osmE). - Escherichia coll peptidoglycan-associated lipoprotein (gene pal). 
- Escherichia coli rare lipoproteins A and B (genes rplA and rplB). - Escherfchia coli copper homeostasis protein cutF 
(or nIpE). - Escherichia coli plasmids traT proteins. - Escherichia coli Col plasmids lysis proteins. - A number of Bacillus 
beta-lactanriases. - Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). - Borrelia burgdorferi outer 
surface proteins A and B (genes ospA and ospB). - Borrelia hermsii variable major protein 21 (gene vmp21) and 7 
(gene vmp7). - Chlamydia trachomatis outer membrane protein 3 (gene omp3). - Fibrobacter succinogenes endoglu- 
canase cel-3. - Haemophilus influenzae proteins Pal and Pep. - Klebsiella pullulunase (gene pulA). - Klebsiella pullu- 
lunase secretk)n protein pulS. - Mycoplasma hyorhinis protein p37. - Mycoplasma hyorhinis variant surface antigens 
A, B. and C (genes vIp ABC). - Neisseria outer membrane protein H.8. - Pseudomonas aeruginosa lipopeptide (gene 
IppL). - Pseudomonas solanaceamm endoglucanase egl. • Rhodopseudomonas viridis reaction center cytochrome 
subunit (gene cytC). - Rrckettsia 17 Kd antigen. - Shigella flexneri invasion plasmid proteins mxU and mxIM. - Strep- 
tococcus pneumoniae oligopeptkle transport protein A (gene amiA). - Treponema pallidium 34 Kd antigen. - Treponema 
pallidium membrane protein A (gene tmpA). - VIbrk) harveyi chitobiase (gene chb). - Yersinia virulence plasmid protein 
yscJ. - Hatocyanin from Natrobacterium pharaonis [4], a membrane associated copper- binding protein. This Is the first 
archaebacterial protein known to be modified In such a fashton).From the precursor sequences of all these proteins, 
a consensus pattern and a set of rules to identify this type of post-translational modification have been developed. 
[0959] Consensus pattern: {DERK)(6)-[LIVMFWSTAG)(2)-[LIVMFYSTAGCQ]-[AGS]-C (C is the lipid attachment 
sitej Additional rules: 1 ) The cysteine must be between positions 15 and 35 of the sequence in conskJeratk)n. 2) There 
must be at least one Lys or one Arg in the first seven posittons of the sequence. 

[0960] 1 1] Hayashi S., Wu H.C. J. Bk)energ. Biomembr. 22:451 -471(1 990). [ 2] Klein P. Somorjai R.L.. Lau PC.K. 
Protein Eng. 2: 15-20(1 988). [ 3] von Heijne G. Protein Eng. 2:531-534(1989).! 4) Mattar S., Scharf B., Kent S.B.H., 
Rodewald K.. Oesterhelt D., Engelhard M. J. Biol. Chem. 269:14939-14945(1904). 
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[09511 Consensus pattern: {UVMHPA^x(2)<:-x-[UV^fl-x^UVM^x^U^^FYl-x^UV^^ 

[LIN^] [The two Cs are Involved in disulfide bonds] Jin m^nur*i^x[^) 

m Wirte K.W.A. Annu. Rev. Biochem. 60:73-99(1991). 
[2] Arondel V.. Kader J.C. Experientia 46:579-585( 1 990). 

[31 Ohlrogge J.B.. Browse J.. Somerville CR. Biochim. Biophys. Acta 1082:1-26(1991). 
[0952] 326. (LAMP) Lysosome-associaled membrane glycoproteins signatures 

Lysosome-assoclated membrane glycoproteins (larnp) [1] are mtegral membrane proteins, specific to lysosomes and 
whose exact biological function is not yet clear. Stnjcturally, the lamp proteins consist of two internally homofc^ous 
lysosome-luminal domains separated by a proline-rich hinge region; at the C-temiinal extremity there is a tiansmem- 
brane region followed by a very short cytoplasmic tail. In each of the duplicated domains, there are two consented 
disulfide bonds. This structure is schematically represented in the figure below 



n n ,11111111 

xCxxxxxCxxxxxxxxxxxxOcxxxxCxxxxxxxxxCxxxxxCxxxxxxxxxxxxCxxxxxCxxxxxxxx 
< xHingex xTMxO 

In mammals, there are twocloselyrelatedtypesof lamp: lamp-1 and lamp-2. In chicken temp-1 isknownasLEPlOO The 
macrophage protein CD68 (or macrosialin) (2J is a heaviy glycosylatedintegral membrane protein whose structure 
consists of a mucin-like domain followed by a prollne-rwh hinge; a single lamp-like domah; a transmembrane region 
and a short cytoplasmic tail. Two signature patterns for this family of proteins were developed The first oneis centered 
on the first consenred cysteine of the duplicated domains. The second corresponds to a region that includes the ex- 
tremity of the second domain, the totality of the transmembrane region and the cytoplasmic tail 

[095^ Consensus pattern: [STAl-C-IUVI^HLIVMFYWJ-A-x-[LIVMI^-x(3HLIVMFYWl-x(3)-Y fC is involved in a 
disulfide boTKi) - 

Consensus pattern: C-x(2)-D-x(3.4MLIVM](2)-P-|LIVMhx-lLIVf»/n-G-x(2HLIVM]- x-G-fLIVMl(2)-x-rUVHfl(4)-/WFYVx- 
[Ll VMJ-x(2HKR]-|RH]- x(1 .2HSTAG1(2)-Y-[EQ] [C is involved in a disulfide bond] 

[ 1J Fukuda M. J. KoL Chem. 266:21327-21330(1991). 

[ 2J Holness C.L.. da Silva R.P., Fawcett J.. Gordon S., Simmons D.L J. Biol. Chem. 268:9661-9666(1993). 
[0954] 327. Lipolytic enzymes 'G-D-S-L* family, serine active site 

[0955] Recently (1 J, a family of lipolytc enzymes has been characterized. This family currently consist of the f ollowina 

protems: ^ 

- Aeromonas hydrophila l9>ase/phosphatidylchoyne-sterol acyltransferase. 
Xenorhabdus luminescens lipase 1 . 

Vibrio mimicus arylesterase. 

Escherichia coli acyl-coA thioesterase I (gene tesA). 

- Vibrio parahaemolyticus thermolabile hemolysin/atypical phospholipase. 

- Ftebbil phospholipase AdRab-B, an intestinal brush border protein with esterase and phospholipase A/lysophos- 
phohpase activity that could be involved in the uptake of dietary lipids, AdRab-B contains four repeats of about 
320 amino acids. 

- Arabidopsis thaliana and Brassic napus anther-specific proline-rich protein APG. 

- A Pseudomonas putida hypothetical protein in trpE-trpG intergenic region. A serine has been identified a part of 
the active site in the Aeromonas. Vibrio mimicus and Escherichia coli enzymes. It is located in a conserved se- 
quence motif that can be used as a signature pattern for these proteins. 

- Consensus pattern: [LI VMFYAG](4)-G-D-S-[LIVMJ.x(1 .2)-{TAGl-G (S is the active site residue] 

[0956] 328. (Lipoprotein 4) Prokaryotic membrane lipoprotein lipid attachment she In prokaryotes membrane lipo- 
proteins are synthesized with a precursor signal peptide, which is cleaved by a specific lipoprotein signal peptidase 
(signalpeptidase II). The peptklase recognizes a conserved sequence and cuts upstreamof a cysteine residue to which 
a glycende-fatty acid lipid is attached |lJ.Some of the proteins known to undergo such processing currently include 
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division of vulval blast cells. - Vertebrate insulin gene enhancer binding protein isH. lsl-1 binds to one of the two cIs- 
acting protein-binding donnains of the Insulin gene. - Vertebrate homeobox proteins lim-l, lim-2 (lim*5) and Iim3. - 
Vertebrate lmx-1 , which acts as a transcriptional activator by binding to the FLAT element; a beta-ceH-specific tran- 
scriptional enhancer found in the Insulin gene. - Mammalian LH-2, a transcriptional regulatory protein involved in the 
control of cell differentiation In developing lymphoid and neural cell types. - Drosophlla protein apterous, required for 
the normal devetopment of the wing and halter imaginal discs. - Vertebrate protein kinases LIMK-1 and LIMK-2. - 
Mammalian rhombotins, Rhombotin 1 (RBTN1 or TTG-1) and rhombotin-2 (RBTN2 or TTG-2) are proteins of about 
1 60 amino acids whose genes are disrupted by chromosomal translocations in T-cell leukemia. - Mammalian and avian 
cysteine-rfch protein (CRP), a 1 92 amino-acki protein of unknown function. Seems to »iteract with zyxin. - Mammalian 
cysteine-rk:h intestinal protein (CRIP), a small protein whfch seems to have a role in zinc absorptbn and may functkxi 
as an intracellular zinc transport protein. - Vertebrate paxilHn, a cytoskeletal focal adhesk)n protein. - Mouse testln. 
Mouse testin should not be confused with rat testin which is a thiol protease homolog. - Sunflower pollen specific protein 
SF3. - Chicken zyxin. Zyxin is a low-abundance adhesion plaque protein whch has been shown to Interact with CRP. 
- Yeast protein LBG1 whk:h is involved in sporulation (4J. - Yeast rho-type GTPase activating protein RGA1/DBM1. - 
Caenorhabdills elegans homeobox protein ceh-14. - Caenorhabditis elegans homeobox protein unc-97. - Yeast hypo- 
thetical protein YKR090w. • Caenorhabditis elegans hypothetical proteins C28H8.6.These proteins generally have two 
tandem copies of a domain, called LIM (forLln-11 lsl-1 Mec-3) in their N-termlnal section. Zyxin and paxiliin areexcep- 
tlons in that they contains respectively three and four LIM dontalns attheir C4erminal extremity. In apterous, isM. LH- 
2. lln-11, llm-1 to lim-3,lmx-1 and ceh-14 and mec-3 there is a homeobox domain some 50 to 95 amino acids after 
theLIM domains. In the LIM domain, there are seven consen/ed cysteine residues and ahistkJine. The arrangement 
foltowed by these conserved reskJues is C-x(2)-C-x{16,23)-H-x(2)-[CH)-x(2)-C-x(2)-C-x(16.21)-C-x(2,3)-[CHD]. The 
LIM domainbinds two zinc Ions [5]. LIM does not bind DNA, rather it seems to act asinterface for protein-protein inter- 
action. A pattern was devetoped that spans the first half of the LIM domain. 

[0947] Consensus pattern: C-x(2).C-x(1 5,21 )-[FYWH]-H-x(2)-[CHl-x(2)-C.x(2)-C-x(3)-[LI VMFJ (The 5 C's and the H 
bind zinc] 

[ 1] Freyd G., Kim S.K., Horvitz H.R. Nature 344:876-879(1990). 

[ 2] Baltz R.. Evrard J.-L. Domon C, Steinmetz A. Plant Cell 4:1465-1466(1992). 

[ 3) Sanchez-Garcia I., Rabbitts TH. Trends Genet. 10:315-320(1994). 

[ 4] Mueller A., Xu G.. Wells R., Hollenberg CP., Piepersberg W. Nuclete AckJs Res. 22:3151-3154(1994). 

( 51 Mrchelsen J.W., Schmeichel K.L., Beckerle M.C., Winge D.R. Proc. Natl. Acad Sci. U S A. 90 4404-4408 

(1993). 

[0948] 324. (LRR) Leucine Rich Repeat 

CAUTION: This Pfam may not find all Leucine Rich Repeats in a protein. Leucine Rich Repeats are short sequence 
motifs present in a number of proteins with diverse functions and cellular focations. These repeats are usually involved 
in protein-protein interactions. Each Leucine Rteh Repeat is composed of a beta-alpha unit. These units form elongated 
non-globular structures. Leucine Rich Repeats are often flanked by cysteine rteh domains. Number of members: 3017 
(1] The leucine-rk:h repeat: a versatile binding motif. Kobe B, Deisenhofer J; Trends Biochem Sci 1994:19:415-421. 
[2] Ciystal stnjcture of porcine ribonuclease inhibitor, a protein with leucine-rich repeats. Kobe 8, Deisenhofer J* Nature 
1993;366:751-756. 

[0949] 325. Plant lipid transfer protein family signature (LTP) 

[0950] Plant cells contain proteins, called lipid transfer proteins (LTP) [1 ,2.3], whfch are able to facllitale the transfer 
of phospholipids and other lipidsacross membranes. These proteins, whose subcellular locatwn Is not yet known, coukf 
play a major role in membrane bksgenesis by conveying phospholipids such as waxes or cutin from their site of bio- 
synthesis to membranes unable to form these lipids. Plant LTP's are proteins of about 9 Kd (90 amino acids) whfch 
contain eight consen/ed cysteine residues ait involved in disulfide bridges, as shown in the following schematfc repre- 
sentation. 



+ + I + + 1 1 1 1 1 **•**•***••***•• 

xCxxxxCxxxxxxCCxxxxxxxxCxCxxxxxxxxxxxCxxxxxxCxx 1 1 1 1 + 
+ 

*C: conserved cysteine involved in a disulfide bond. 
•**: position of the pattern. 
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gation of nonexchange chromosomes during meiosis. - Human CENP^ [4]. CENP-E is a protein that associates with 
kinetochores during chromosome congression. relocates to the sphdie mrdzone at anaphase, and is quantitatively 
discarded at the end of the ceU divisioa CENP-E is probably an important motor molecule in chromosome movement 
and/ or spindle elongation. - Human mitotic kinesin^ike protein-1 (MKLP-1). a motor protein whose activity is directed 
toward the microtubule's plus end - Yeast KAR3 protein, which is essential for yeast nuclear fusion during mathg 
KAR3 may mediate microtubule sfiding during nuclear fusion and possibly mitosis. - Yeast CIN8 and KIP1 proteins 
which are required for the assembly of the mitotc spindle. Both proteins seem to interact with spindle microtubules to 
produce an outwardly directed force acting upon the poles. - Fission yeast cut? protein^ which is essential for spindle 
body duplication during mitotic division. - Emericella nidulans bimC, which plays an important role in nuclear diviskxi 
- Emericella nidulans kIpA. - Caenorhabditis elegans unc-104. whtth may be required for the transport of substances 
needed for neuronal cell differentiation. - Caenorhabditis elegans osm-3. - Xenopus Eg5. whfch may be involved m 
mitosis. - Arabidopsis thaliana KatA, KatB and katC. - Chlamydomonas reinhardtii FLA10/KHP1 and KLPl Both pro- 
teins seem to play a role bi the rotation or twisting of the microtubules of the flagella. - Caenorhabditis elegans hypo- 
thetfcal protein T09A5.2.The kinesin motor domain is located in the N-temiinal part of most of theabove proteins, with 
the exception of KAR3, klpA, and ncd where it is kjcatedin the C-terminal sectfon.The kinesin motor domain coritains 
about 330 amino acids. An ATP-binding motif of type A is found near position 80 to 90. the C-lerminal half of the domainis 
involved in microtubule-binding. The signature pattem for that domain isderived from a consented decapeptide Inside 
the microtubuie-birKling part 

Consensus pattem: [GSA]^KRHPSTQVM]-[UVMF]-x-ILIVMFHIVCpD-L-[AHhG-[SANJ-E 

[ 1] Bloom G.S.. Endow S.A. Protein Prof. 2:1109-1171(1995). 

I 2] Vallee R.B., Shpetner H.S. Annu. Rev. Bkx:hem. 59:909-932(1990). 

[ 3) Brady S.T. Trends Cell Biol. 5:159-164(1995). 

[ 41 Endow SA. Trends Biochenn. Scl. 16:221-225(1 991 ).[E1 J 

[0942] 321 . Ribosomal protein LI 5 signature 

Ribosomal protein LI 5 is one of the proteins from the large ribosomal subunit. In Escherichia coli, LI 5 is known to bind 
the 23S rRNA It betongs to a family of ribosomal proteins which, on the basis of sequence similarities [1] groups* - 
Eubactenal LIS. - Plant chloroplast LI 5 (nuclear-encoded). - Archaebacterial LI 5. - Vertebrate L27a - Tetrahymena 
thermophila L29. - Fungi L27a (L29, CRP-l. CYH2).L15 is a protein of 144 to 154 amino-ackJ residues. As a signature 
pattem, a conserved regbn was selected in the C-terminal section of these proteins. 

[0943] Consensus pattem: K-[LIVM](2HGASLhx-IGT]-x-(LI\^A]-x(2.5)-[UVM]-x-(LIVMFl-x(3.4)-[LIVMFCAH^ 
{2)-A-x(3)-[LIVM]-x(3)-G i * /i Ji r 

[0944] [ 1) Otaka E., Hashimoto T. Mizuta K., Suzuki K. Protein Seq. Data Anal 5 301-313(1993) 
[0945] 322. LBP / BPI / CETP family signature 

The following mammalian lipki-binding senjm glycoproteins belong to the same family [1.2.3]: - Lipopolysaccharide- 
binding protein (LBP). LBP binds to the lipid A moiety of bacterial lipopolysaccharides (LPS), a glycolipid present in 
the outer membrane of all Gram-negative bacteria. The LBP/LPS complex seems to interact with the GDI 4 receptor 
and may be responsible for the secretkxi of alpha-TNF. - Bactericidal permeability-increasing protein (BPI) Like LBP. 
BPI binds LPS and has a cytotoxic activity on Gram-negative bacteria. - Cholesteryl ester transfer protein (CETP)' 
CETP IS involved in the transfer of insoluble cholesteryl esters in reverse cholesterol transport. - Phospholipid transfer 
protein (PLTP). May play a key role in extracellular phospholipkl transport and modulatbn of HDL partk:les These 
proteins are stniclurally related and share many regions of sequencesimilariUes. As a signature pattem one of these 
regions was selected, which is kx»ted in the N-terminal section of these proteins; a region which could be involved in 
the binding to the lipids [2]. 

Consensus pattern: lPAHGAl-[UVMC)-x(2)-R-[IV]-|SThx(3)-L-x(5)-[EQ]-x(4)-(LIVM]-(EQKJ-x(8)-P 

( 11 Schumann R.R.. Leong S.R.. Flaggs G.W.. Gray PW.. Wright S.D.. Mathison J.C.. Tobias PS . Ulevitch R J 
Science 249:1429-1431(1990). 

{ 21 Gray PW.. Flaggs G., Leong S.R.. Gumina R.J.. Weiss J.. Ooi C.E.. Elsbach R J. Biol. Chem 264 9505-9509 
(1989). 

( 3] Day J R.. Albers J.J.. Lofton-Day C.E.. Gilbert TL. Ching A.FT. Grant FJ.. O'Hara P.J., Marcovina S M 
Adolphson J.L. J. Biol. Chem. 269:9388-9391 (1 994). 

[0946] 323. LIM domain signature and profile 

Recently [l .21 a number of proteins have been found to contain a consented cysleine-rich domain of about 60 amino- 
acid residues. These proteins are: - Caenorhabditis elegans mec-3; a protein required for the differentiatkxt of the set 
of SIX touch receptor neurons in this nematode. - Caenorhabditis elegans lin-li; a protein required for the asymmetric 
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which inhibit both cathepsin D (aspartic proteinase) and trypsin. - Alpha-amylase/subtilisin inhibitors from barley and 
wheat. - AIbumin-1 (WBA-l) from goa bean seeds I3J. - Miraculin from Richadella dulcifica [4], a sweet taste protein. 
- Sporamin from sweet potato [5). the major tuberous root protein. - Thiol proteinase inhibitor PCPI 8.3 (P340) from 
potato tuber [6]. - Wound responsive protein gwin3from poplar tree [7]. - 21 Kd seed protein from cocoa [8]. All these 
proteins contain from 170 to 200 amino acid residues and one or twointrachain disulfide bonds. The best conserved 
region is found in their N-terminal section and is used as a signature pattern 
[0938] Consensus pattern: [UVMJ-x-D-X'(EDNTYHDGHRKHDENQ]-x-|LI VI^J.x(5)-Y-x-[UVMl - 

[ 11 l^skowskl M., Kato I. Annu. Rev. Biochem. 49:593-626(1980). 

( 2] Ritonja A., Krizaj I.. Mesko R, Kopitar M., Lucovnik R. Strukelj B., Pungercar J., Buttle D.J., Barrett A.J Turk 
V. FEBS Lett. 267: 13-1 5(1 990). 

[ 3) Korlt A.A., Strike RM.. de Jersey J. Eur. J. Biochem. 181:403408(1989). 

[ 4) Theerasilp S.. Hitotsuya H.. Nakajo S.. Nakaja K., Nakamura Y. Kurihara Y J. Bbl. Chem. 264 6655-6659 
(1989). 

[ 51 Hattori T» Yoshlda N., Nakamura K. Plant Mol. Bfol. 13:563-572(1989). 

[ 6] Krizaj I., Drobnfc-Kosorok M., Brzin J.. Jerala R.. Turk V. FEBS Lett. 333:15-20(1993). 

( 7J Bradshaw H.D.. Hollfck J.B., Parsons TJ., Clarke H.R.G., Gordon M.P. Plant Mol. Biol. 14:51-59(1989). 

[ 8] Tai H., McHenry L, Fritz P.J., Furtek D.B. Plant Mol. Biol. 16:913-915(1991). 

[0939] 31 9. Beta-ketoacyl synthases active site 

Beta-ketoacyl-ACP synthase (KAS) [1 ] is the enzyme that catalyzes the condensatbn of malonyl-ACP with the growing 
fatty acid chain. It is found as a component of the following enzymatic systems: - Fatty acid synthetase (FAS), which 
catalyzes the formation of long^ihain fatty acids from acetyl-CoA, malonyl-CoA and NADPH. Bacterial and plant chlo- 
roplast FAS are composed of eight separate subunits which correspond to different enzymatic activities; beta-ketoacyl 
synthase is one of these polypeptides. Fungal FAS consists of two multifunctional proteins, FASl and FAS2; the beta- 
ketoacyl synthase domain rs kxated in the C-tenminal section of FAS2. Vertebrate FAS consists of a single multifunc- 
tional chain; the beta-ketoacyl synthase domain is located in the N-terminal section [2]. - The multifunctional 6-meth- 
ysalfcylte acid synthase (MSAS) from Penicilltum patulum [3]. This is a multifunctional enzyme involved in the biosyn- 
thesis of a polyketide antibiotic and which has a KAS domain in its N-terminal section. - Polyketide antibiotic synthase 
enzyme systems. Polykelides are secondary metabolites produced by microorganisms and plants from simple fatty 
acids. KAS is one of the components involved in the biosynthesis of the SUeptomyces polyketkJe antibiotics granatacin 
[4], tetracenomycin C [5] and erythromycin. - Emericella nidulans multifunctional protein Wa. Wa is involved in the 
biosynthesis of conkJial green pigment. Wa is protein of 216 Kd that contains a KAS domain. - Rhizobium nodulation 
protein nodE. which probably acts as a beta-ketoacyl synthase in the synthesis of the nodulation Nod factor fatty acyl 
chain. - Yeast mitochondrial protein CEM1. The condensation reactran is a two step process: the acyl component of 
an activated acyl primer is transferred to a cysteine residue of the enzyme and is then condensed with an activated 
matonyl donor with the concomitant release of carbon dioxkJe. The sequence around the active site cysteine is well 
conserved and can be used as a signature pattern. 

[0940] Consensus pattern: G.x(4)-(LIVMFAP]-x(2)-[AGCl-C-ISTA](2)-[STAGhx(3)-[LIVMF] [C is the active site resi- 
due] 

[ 1) Kauppinen S., Siggaard-Andersen M., von Wettstein-Knowles P. Carlsberg Res. Commun. 53:357-370(1988). 

[ 2] Witkowski A.. Rangan VS.. Randhawa 2.I., Amy CM.. Smith S. Eur. J. Biochem. 198:571-579(1991). 

[ 31 Beck J., Ripka S.. Siegner A., Schillz E.. Schweizer E. Eur. J. Biochem. 192:487-498(1990). 

[ 4] Bibb M.J., Biro S., Motamedi H.. Collins J.F, Hutchinson C.R. EMBO J. 8:2727-2736(1989). 

[ 5] Sherman D.H., f^alpartida F, Bibb M.J.. Kieser H.M.. Bibb M.J.. Hopwood D.A. EMBO J. 8:2717-2725(1989). 

[0941] 320. Kinesin motor domain signature and profile 

Kinesin [1,2.3] is a microtubule-associated force-producing protein that mayplay a role in organelle transport. Kinesin 
is an oligomers: complex composedof two heavy chains and two light chains. The kinesin motor activity isdirected 
toward the microtubule's plus end.The heavy chain is composed of three structural domains: a large globular N-termlnal 
domain which is responsible for the motor activity of kinesin (it isknown to hydrolyze ATP, to bind and move on micro- 
tubules), a central alpha-helical coiled coil domain that mediates the heavy chain dimerizatkjn; and asmall globular C- 
terminal domain which interacts with other proteins (such asthe kinesin light chains), vesicles and membranous or- 
ganeiles.A number of proteins have been recently found that contain a domain similarto that of the kinesin 'motor' 
domain 11.4,E1]: - Drosophila claret segregattonal protein (ncd). Ned is required for normal chromosomal segregation 
in meiosis, in females, and in early mitotic divisions of the embryo. The ncd motor activity is directed toward the mi- 
crotubule's minus end. - Drosophila kinesin-like protein (nod). Nod is required for the distributive chromosome segre- 
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[ 1] NeuwaW A.F., York J.D.. Majerus P.W. FEBS Lett. 294:16-18(1991). 

1 2) Gtaeser H-U, Thomas D.. Gaxbia R.. Montrlchard P.. Suidin-Kerjan Y., Serrano R. EMBO J. 1^.^105-3110 
(1 993). 

[ 3] Bone R., Springer J.P., Atack J.a Proc. Natl Acad. Sd. U.S.A 89:10031-10035(1992). 
[0924] 31 3. Ion transport protein 

[0925J This family contains Sodium. Potassium, Calcium ion channel This family is 6 transmembrane helices in which 
the last two helices flank a loop whnh determines on selectivity. In some sub-lamilies (e.g. Na channels) the domain 
IS repeated fourtmes. whereas in others (e.g. K channels) the protein fomfis as a tetramor in the membrane A bacterial 

fl2^I"'° °' '"^ ^"^ '® P'^m family due to it lacking the first four helces 

[0926] 31 4. Isochrate and isopropylmalate dehydrogenases signature (isodh) 

Isocitrate dehydrogenase (IDH) [1 ,2] is an important enzyme of caitohydrate melaboKsm which catalyzes the oxklative 
decarboxylatton erf isocttrate into alpha-ketogtutarate. IDH is either dependent on NAD+ (EC 1 11 4H or on NADP+ 
i^f^T^r^V" eukaryotes there are at least three isozymes of IDH. two are located in the mitochondrial matrix (one 
NAD^ependent. the other NADP+^ependent). while the third one (also NADP+^Jependent) is cytoplasmic In Es- 
cherichia coli me acthrfty of a NADP^ependent fonnn of the enzyme is controlled by the phosphorytatton of a serine 
r^nutVo !, P^ff«^^»^ of 'DH is completeV inactivated. 3-isop,opylmalate dehydrogenase (EC l.l.l.ea 
(IMDH) (3.4J catalyzes the third step in the biosynthesis of leucine in bacteria and fungi, the oxidative decarboxylation 
of a^sopropylmalate into2-Qxo^thylvalerate. Tartrate dehydrogenase (EC 1.1.1.93 ) [5] catalyzes the reduction of 
tartrate to oxatoglycolate. These enzymes are evolutionary related [1.3.4.5]. The best conserved regfon of these en- 
zyme IS a gVcine-rich stretch of residues located in the C-termlnal section. This region was used as a signature pattern 
KoAS^rf™ ''^SHUKm>(FYDN]^3-(Dlfn4IM\^x4ST6DNHDNl-x(2)4SGAPJ-x(3.4)-G!STGh 

Ul ""^'^^^ RanraHngam V., Hekners N.H.. Koshland D.E. Jr., Stroud R.M. Proc. NaU. Acad. 

Sci. U.S.A. 86:8635-6639(1989). 

[ 2J Cupp J.R, McAlister-Henn L J. Biol. Chem. 266:221 99«22205{1 991). 

[ 3] Imada K.. Sato M.. Tar^aka N., Katsube Y. Matsuura Y. Oshima T. J. Mol. Bbl. 222725-738(1991) 

[ 4J Zhang T. , Koshland D.E. Jr. Protein Sci. 4:84-92(1 995). 

1 5] Tipton RA., Beecher B.S. Arch. Biochem. Biophys. 313:15-21(1994). 

[0928] 31 5. Jacalin-like lectin domain. 

[09291 Proteins containing this domain are lectins. It is found inl to 6 copies in these proteins The domain is also 
found m the animal prostatic spermine-binding protein (Swiss: PI 550 n 

5?6^-603 ^''^'^''^'^^^ ^^nee R. Sham^ V. Surolia A. Vijayan M; Nat Struct Biol 1996:3: 

[0931] 316. KH domain 

^lonlL^atoia^ ^'"^^^^ ^"""^ RNA directly. Autoantibodies to Nova, a KH domain protein, cause paraneoplastic 



[1) Burd CG, Dreyfuss G, Science 1994:265:615-621. 

[2J Musco G, Stier G. Joseph C. Castiglione Morelli MA. Nilges M. Gibson TJ. Pastore A. Cell 1996;85:237-245. 
[0933] 317. Kelch motif 

[09341 The kelch motif was initially discovered In Kelch (Swiss:Q04652V In this protein there are six copies of the 
.^tK^^",f f that SwissO04652 is related to Galactose Oxidase (1] for which a structure has been solved 
(2). The kelch motif forms a beta sheet. Several of these sheets associate to form a beta propeller structure as found 
in J2^y^ 

E!^? ''L^v ^'"'^ ^ 1994:236:1277-1282. [2] Ito N. Phillips SE. Stevens C. Ogel 2B. McPherson 

MJ. Keen. JN. Yadav KD. Knowles PF. Nature 1991 ;350:87-90. yei ^d. Mcrnerson 

[0936] 318. Soybean trypsin inhibitor (Kunitz) protease Inhibitors lamily signature 

(09371 -nie soybean trypsin Inhibitor (Kunitz) family (1) Is one of the numerous families ol proteinase inhibitors It 
compr«e plant proteins which have Inhibltoor activity against serine proteinases from the trypsin and subtiBsin families 

hK>l proteinases and aspartic proteinases as well as some proteins that are probably imrolved in seed storaqe Thi^ 
family « '^""ently known to group the following proteins: - Tiypsin inhibitors A. 8, C, KTII. and Kn2 from soybean - 
T7 L from coral beans (Erythrlna sp.). - T^rpsin inhibitor DE5 f rom sandal bead tree - Trypsin inhibitors 

A (m MA). IB (WTI-IB). and 2 (mi-2) from goa bean. - Trypsin inhibitor from Acacia confusa. - T^Ti^SS 
from saktree. - Chymotrypsin inhibitor 3 (WCI-S) from goa bean. - Cathepsin D inhibitors PDI and NDI f^ potato [21 
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Klebsiella pneumoniae proleln pqqF. This protein is required for the biosynthesis of the coenzyme pyrrolo<furno- 
line-quinone (PQQ). It is thought to be protease that cleaves peptide bonds in a small peptide (gene pqqA) thus 
providing the glutamate and tyrosine residues necessary for the synthesis of PQQ. 
Yeast protein AXL1 » which is involved in axial budding [3]. 
Eimerta bovis sporozoite developmental protein. 

- Escherichia coli hypothetical protein yddC and HI 1 368. the corresponding Haemophilus influenzae protein. 
Bacillus subtilis hypothetical protein ymxG. 

Caenorhabdrtis elegans hypothetical proteins C28F5.4 and F56D2.1. 

[0914] It should be noted that in addition to the above enzymes, this family also includes the core proteins I and II 
of the mitochondrial bcl complex (also called cytochrome c reductase or complex HI), but the situation as to the activity 
or lack of activity of these subunits is quite complex: 

In mammals and yeast, core proteins I and II lack enzymatic activity. 

- In Neurospora crassa and in potato core protein I is equivalent to the beta subunrt of MRP. 
In Euglena gracilis, core protein I seems to be active, while subunrt II is inactive. 

[0915] These proteins do not share many regions of sequence similarity; the most noticeable is in the N-terminat 
sectbn. This region includes a conserved histidine folbwed, two reskJues later by a glutamate and another histidine. 
In pitrilysin, it has been shown [4] that this H-x-x-E-H motif Is involved in enzyme activity; the two histidines bind zinc 
and the glutamate is necessary for catalytic activity. Non active members of this family have lost from one to three of 
these active site rescues. We developed a signature pattern that detect active members of this family as well as some 
inactive members. 

[0916] Consensus pattern G-x{8.9)-G-x-[STA]-H-(LIVMFY]-[LIVMC]-IDERN]-[HRKLHLMFAT|-x-[LFSTHJ-x- 
IGSTANJ-[GST] [The two H are zinc ligands] [E is the active site reskJue] Sequences known to belong to this class 
detected by the pattern ALL active members as well as all MPP alpha subunits and core II subunits. Does not detect 
inactive core I subunits. 

[0917] Note: these proteins betong to family M16 in the classificatbn of peptidases [5]. 

[ 1) Rawlings IM.D., Barrett A. J. Biochem. J. 275:389-391(1991). 

[ 2J Braun H.-R, Schmitz U.K. Trends Biochem. ScL 20:171-175(1995). 

[ 3] Becker A.B., Roth R.A. Proc. Natl. Acad. Sci. U.S.A. 89:3835-3839(1992). 

[ 4) Fujita A., Oka C. Arikawa Y, Katagai T., Tonouchi A., Kuhara S.. f^isumi Y. Nature 372:567-570(1994). 
[ 5J Rawlings N.D.. Barrett A. J. Meth. Enzymol. 248:183-228(1995). 

[0918] 310. Involucrin repeat 

[0919] Eckert RL, Yaffe MB, Crish JF, Murthy S. Rorke EA. Welter JF, J Invest Dermatol 1993;100:613-617. 
[0920] 31 1 . Isochorismatase family. This family are hydrolase enzymes. 

[0921] Romao MJ, Turk D. Gomis-Ruth FX, Huber R. Schumacher G, Mollering H. Russmann L, J Mol Biol 1992 
226:1111-1130. 

[0922] 312. Inositol monophosphatase family signatures (inositoLP) 

It has been shown [1 ) that several proteins share two sequence motifs. Two of these proteins are enzymes of the 
inositol phosphate second messenger signaling pathway: - Vertebrate and plants inositol monophosphatase (EC 
31.325). - Vertebrate Inositol polyphosphate 1 -phosphatase (EC 31. 3.57 ). The function of the other proteins is not 
yet clear: - Bacterial protein cysQ. CysQ could help to control the pool of PAPS (3'-phosphoadenoside 5'-phosphosul- 
fate), or be useful in sulfite synthesis. - Eschertehia coli protein suhB. Mutations in suhB results in the enhanced syn- 
thesis of heat shock sigma factor (htpR). - Neurospora crassa protein Qa-X. Probably involved in quinate metabolism. 

- Emericella nidulans protein qutG. Probably involved in quinate metabolism. - Yeast protein HAL2/MET22 [2J involved 
in salt tolerance as well as methionine biosynthesis. - Yeast hypothetical hypothetteal protein YHR046c. - Caenorhab- 
ditis elegans hypothetk:al protein F13G35. - A Rhizobium leguminosarum hypothetrcal protein encoded upstream of 
the pss gene for exopolysaccharide synthesis. - Methanococcus jannaschii hypothetcal protein M J01 09. It is suggested 
[1] that these proteins may act by enhancing the synthesis or degradation of phosphorylated messenger molecules. 
From the X-ray structure of human inositol monophosphatase [S], it seems that some of the conserved residues are 
involved in binding a metal ion and/or the phosphate group of the substrate. 

[0923] Consensus pattern: (FWV)-x(0, 1 )-(LI VM1-D-P-[LI VM]-D-[SGHST]-x(2)-[FYl-x- 
[HKRNSTY] [The first D and the T bind a metal bn]- 

Consensus pattern: [WV]-D-x-[ACl-IGSAHGSAPV]-x-[LIVACP)-(LIV]-[LIVAC]-x(3)-IGH]-lGAl- 
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GMP .n o IMP [3J. It converts nucleobase. nucleoskle and nucleotide derivatives of G to A nucleotides, and maintains 
intracellular balance of A and G nucleotides. IMP dehydrogenase and G MP reductase share many regions of sequence 
similarity. One of these regions is centered on a cysteine residue thought (3] to be involved in binding IMP This reoion 
was used as a signature pattern. ^ 

[0^ Consensus pattern: (U VMh[RKHU VMl.G-[UVMJ-G.x^.S4LIVMJ-C.x«T JC is the putative IMP^inding resi- 
[ IJ Collart F.R, Huberman E. J. Biol. Chem. 263:15769-15772(1988). 

[ 2J Natsumeda Y.. Ohno S.. Kawasaki H.. Konno Y., Weber G.. Suzuki K. J. Biol. Chem. 265 5292.5295(1990) 
[ 31 Andrews S.C., Guest J.R. Biochenrt J. 255:35-43(1 988). 

[0907] 306. (IPPc) Inositol polyphosphate phosphatase family, catalytic domain 
[0908] [1) York JD. Ponder JW. Chen ZW. Mathews FS. Majerus PW 

Bioihemistry 1994;33:13164-13171 . [21 Jefferson AB. Auethavekiat V. Pot DA. Williams LT, Majerus PW; J Biol Chem 
1997;272:5983-5988. [3] Zhang X, Jefferson AS, Auethavekiat V. Majenjs PW; Proc Natl Acad Sci U S A 1995-92- 
4853^1856. [4] York JD. Majerus PW. Proc Natl Acad Sci U S A 1 990;87:9548.9552. [5] NeuwaU AF, York JD Majerus 

PWi ' 

FEBS Lett 1991;294:16-18. 
[0909] 307. IQ calmodulin-binding motif 

[1] Xie X Harrison DH, Schlichting I. Sweet RM. Kalabokis VN, Szent-Gyorgyi AG. Cohen C; Nature 1994'368- 
306-312. ' * 

[2] Rhoads AR. Friedberg F; FASEB J 1997;11:331-340. 

[091 0] 308. Inosine-uridine pref emng nucleoside hydrolasefamily signature (I U nuc hydro) 

Inosinenjridine preferring nucleoside hydrolase (EC 3.2.2.1 ) (lU-nucleosidehydrolase or lUNH) is an enzyme first iden- 

tilled in protozoan [1] that catalyzes the hydrolysis of all of the commonly occuring purine and pyrimidine nucleoskies 

intoriboseandtheassociated base, but hasapreferenceforinosineandurklineass 
or mese parasitic org^^^ 

; '''^ fasciculata has been sequenced and characterized, it is an homotetrameric enzyme of subunits 

of 34 Kd. An histidine has been shown to be important for the catalytb mechanism, it acts a proton donor to activate 
the hypoxanthme leaving group. lUNH is evolutkwiary related to a number of uncharacterized proteins from various 
biological sources, notably: - Escherichia coli hypothetical protein yaaF - Escherichia coli hypothetical protein ybeK 
- Eschenchia coli hypothetical protein yeiK. - Fission yeast hypothetical protein SpAC17G8 02 - Yeast hypothetical 
protein YDR400w. - An hypothetical protein from the archaebacteria Desulf urolobus ambivalens. As a signature pattern 
or these proteins, a highly conseived regk>n was selected located in the N-terminal extremity. This region contains 
four conserved aspartates that have been shown [21 to be located in the active site cavity. 
[091 1] Consensus pattern: D-x-D-[PTHGAJ-x-D-D-(TAV]-{VI]-A - 

[ 1J Gopaul D.N.. Meyer S.L. Degano M.. Sacchettini J.C., Schramm V.L Biochemistry 35 5963-5970(1996) 
[ 2J Degano M.. Gopaul D.N.. Scapin G., Schramm V.L. Sacchettini J.C. Biochemistry 35:5971-5981(1996). 

[0912] 309. (Insulinase) 
Insulinase family, zinc-binding regbn signature 
^ (aka PeptkJase.M16) 

[091 3] A number of proteases dependent on divalent cations for their activity have been shown fl 21 to belonq to 
one family, on the basis of sequence similarity. These enzymes are listed betow. 
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Ir^uhnase (EC 3 4.24.56) (also known as insutysin or insulin^Jegrading enzyme or IDE), a cytoplasmic enzyme 
whi^ seems to be involved in the cellular processing of insulin, glucagon and other small polypeptides 
E^herichia coli protease III (EC 34.24.55) (pitrilysin) (gene ptr). a periplasmic enzyme that degrades small pep- 

Mitochondrial processing peptidase (EC 3.4.24.64) (MPP). This enzyme removes the transit peptide from the pre- 
cursor fomi of proteins imported from the cytoplasm across the mitochondrial inner membrane. It is composed of 
two nonidentical homologous subunits termed alpha and beta. The beta subunit seems to be catalytically active 
while the alpha subunit has probably lost its activity. 

Nardilystn (EC 3.4.24.61) (N-arginine dibasic convertase or NRD convertase) this mammalian enzyme cleaves 
peptide substrates on the N-terminus of Arg residues in dibasic stretches. 



1 oc 
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1 2] Atomi K, Ueda M.. Hikida M.. Hishida T. Teranishi Y., Tanaka A. J. Blochem. 107:262-266(1990). 
[0890] 299. initiatbn factor 2 subunit 

[0891 J This family includes initiation factor 2B alpha, beta and delta subunits from eukaiyotes. related proteins from 
archaebacteria and IF-2 from prokaryotes. Initiatk)n factor 2 binds to Met-tRNA. GTP and the small ribosomal subunit. 
[0892] [ 1 1 Kyrpides NC. Woese CR. Pfoc Natl Acad Sci U S A 1 998;95:3726-3730. 
[0893] 300. Initiation factor 3 signature 

Initiation factor 3 (IF-3) (gene infC) [1) is one of the three factors required for the initiatron of protein bk)synthesis in 
bacteria. I F-3 is thought to function as a fidelity factor during the assembly of the temaiy Initlatfon complex which consist 
of the 30S ribosomal subunit, the initiator tRNA and the messenger RNA. IF-3 binds to the 30S ribosomal subunit; it 
is a basic protein of 141 to 212 residues. The chtoroplast intliatran factor IF-3(chl) is a protein that enhances the poly 
(A.U.G)-dependent binding of the initiator tRNA to chtoroplast ribosomalSOs subunits. In its mature form it is a protein 
of about 400 residues whose central section is evolutionary related to the sequence of bacterial IF-3 [2].As a signature 
pattern a highly consented region was selected located in the central section of bacterial IF-3 and of IF-3(chO 
[0894] Consensus pattern: lKR]^UVM](2HDNHFYl^GSNHKRHUVMFYS]-x-[FYl-[DEQTH]-x(2)-[KRQ]- 

[ 1] Liverls D.. Schwartz J.J., Geertman R., Schwartz I. FEMS Microbiol. Lett. 112:211-216(1993). 
[ 2] Lin Q.. Ma L, Burkhart W., Spremulli LL J. BkA. Chem. 269:9436-9444(1994). 

[0895] 301 . Imidazoleglycerol-phosphate dehydratase signatures (IGPD) 

ImidazoleglyceroliJhosphate dehydratase (EC 4.2.1.19) is the enzyme that catalyzes the seventh step In the biosyn- 
thesis of histidine in bacteria, fungi and plants. In most organisms rt is a monofunctional protein of about 22 to29 Kd. 
In some bacteria such as Escherichia coif it is the C-temiinaf domain of a bifunctional protein that include a histidinol- 
phosphatase domain [1]. Two signature patterns were developed that each include two consecutive histidine residues. 
[0896] Consensus pattern: [UVMY]-JDE]-x-H-H-x(2)-E-x(2)-[GCAJ-[LIVMHSTAC]-[LIVM]- 
Consensus pattern: G-x-lDN]-x-H-H-x(2)-E-[STAGC)-x-(FY]-K - 

[0897] [ 1] Carlomagno M.S., ChiariottI L, Alifano R. Nappo A.G., Bmni C.B. J. Mol. Btol. 203:585-606(1988). 
[0898] 302. lndole-3-glycerol phosphate synthase signature (IGPS) 

lndole-3-glycerol phosphate synthase (EC 4. 1.1. 48 ) (IGPS) catalyzes the fourth step in the biosynthesis of tryptophan: 
the ring closure of 1-(2-carboxyiDhenylamino)-1-deoxyribulose into indol-3-glycerol-phosphate.ln some bacteria. IGPS 
is a single chain enzyme. In others - such as Escherichia coli • it is the N4emninal domain of a biluncttonal enzyme 
that also catalyzes N-(5'-phosphoribosyl)anthranilate isomerase (PRAI) activity, the third step of tryptophan btosynthe- 
sis. In fungi, IGPS is the central domain of a trif unctkxial enzyme that also contains a PRAI C-terminal domain and a 
glutamine amidotransferase N-terminal domain. The N-terminal section of IGPS contains a highly conserved region 
whrch X-ray crystaltography studies (IJ have shown to be part of the active site cavity. This region was used as a 
signature pattern for IGPS. 

[0899] Consensus pattern: [LIVMFYl-[LIVMC]-x-E-[LIVMFYCJ-K-[KRSPl-[STAKl-S-P-[ST]-x(3)-[LIVMFYST)- 
[0900] [ 1] Wilmanns M., Priestle J.P., Niermann T., Jansonius J.N. J. Mol. Bol. 223:477-507(1992) 
[0901] 303 (IL2) Interleukin 2. 31 members 

[0902] 304. (ILVD EDD) Dihydroxy-acW and 6-phosphogluconate dehydratases. Two dehydratases have been 
shown (11 to be evolutionary related: - Dihydroxy-acid dehydratase (EC 4.2.1.9 ) (gene ilvD or ILV3) which catalyzes 
the fourth step in the biosynthesis of isoleucine and valine, the dehydratatbn of 2,3-dihydroxy-isovaleic acid into alpha- 
ketoisovaleric acid. - 6-phosphogluconate dehydratase (EC 4.2.1.12) (gene edd) whk:h catalyzes the first step in the 
Entner-Doudoroff pathway, the dehydratation of 6-phospho-D-gluconate into 6-phospho-2-dehydro-3-deoxy-D-gluco- 
nate. - Escherichia coli hypothetical protein yjhG. Both enzymes are proteins of about 600 amino acid resWues. Two 
highly consented regions have been developed as signature patterns. The first pattern is kx^ted in the N-terminal part 
and contains a cysteine that could be involved in the binding of a 2Fe-2S iron-sulfur cluster [2]. The second pattern is 
located in the C-terminal half. 

[0903] Consensus pattern: C-D-K-x(2)-P-[GA]-x(3)-[GA] [The C coukJ be a 2Fe-2S ligand] 
Consensus pattern: (SA]-L-[LIVM]-T-D-(GA1-R-ILIVMF]-S-[GA]-[GAV1-IST|- 

[0904] [ 1) Egan S.E., Fliege R., Tong S., Shibata A., Wolf R.E. Jr., Conway T. J. Bacteriol. 174:4638-4646(1992). 
[2) VelascoJ.A.. Cansado J., Pena M.C., KawakamiT., Laborda J.. Notario V. Gene 137:179-185(1993). 
[0905] 305. IMP dehydrogenase / GMP reductase signature 

IMP dehydrogenase (EC 1.1.1.205) (IMPDH) catalyzes the rate-limiting reaction of de novo GTP biosynthesis, the 
NAD-dependent reduction of IMP into XMP [I). Inhibition of IMP dehydrogenase activity results in the cessatbn of DNA 
synthesis. As IMP dehydrogenase is associated with cell proliferation, it b a possible target for cancer chemotherapy 
Mammalian and bacterial IMPDHs are tetramers of Wentical chains. There are two IMP dehydrogenase isozymes in 
humans [2].GMP reductase (EC 1.6.6.8 ) catalyzes the irreversible and NADPH-dependent reductive deaminatbn of 
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xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxHHHHHHHHttlHHHHHHHHHxxxxxxxxxx 1 1 1 1 1 1 

1^1) Gehring W. J. (In) Guidebook to the homebox genes. Duboule D., Ed. . ppi -10. Oxford Univeraity Press. Oxford. 

[2]BuerglinT.R.(ln)Guidebooktothehori«bax^^^^ 

1 3J Gehring W.J. Trends Bioctienrt Sd. 17277-280(1992). 

[ 4J Gehring W. J., HiromI Y. Annu. Rev. Genet 20:147-1 73{i 986). 

[ 5] Schofield P.N. Trends Neurosci. 10:3-6(1987). 

[0883] 'Homeobox" antennapedia-type protein signature (home2) 

The homeotic Hox proteins are sequence-specTic transcription factors. They are part of a develoomentaJ rMubto™ 
system that provides cells with specific poeitionai identities on the amerior-isterkr?A Pi a^ m T hLl^^^ 
««tain a •hcxoeobox' domain. In Drosophila and other insects, mere .Zm T.ent Hox^ii tUtare iSS'L' 

S^D^^^S""'*' "^f ^ ''"'^^'^ ^''^ ^ genesVrganized in four ^m^i n 

eight Drosophila Hox genes the homeobox domain is highly similar and a conserved hexapeptide isfSve to^J«^ 

[08841 Consensus pattern: IUVMFEJ-IFYJ-P-W-M-(KRQTA]- 

( 1] McGinnis W., Krumlauf R. Cell 6e:283-302n99g) 
1 2] Scott M R Cell 71:551 -553(1 992V 

[0885] 'Homeobox* engrailed-type protein signature (homeS) 

[0887] Consensus pattern: L-M-A-[EQJ-G-L-Y-N- 

! o! ^T;, ^ ^ • "^"^^'^ ^ ^ ^^'"^^ B*°P^ys Acta 989:25-48(1989) 

[ 2J Gehnng W.J. Science 236:1245-1252(1987). 

[0888] 298. Isocitrate lyase signature (ICL) 

A cysteine, a histidine and a glutamate or asoartate havA h^on fr.,.r.r* ♦ k ? oacteria. fungi and plants. 

On.«,e^steineres«.e.Lsrd;rerr^^^ 

SSar?^' ' "^''""^ hexapeptide ,hat can be used as a signaturfpaSem for this'^e L ZTe 
[0889] Consensus pattem: K-IKRhC-G-H-|LMQJ is a putative active site residueh 



1 1] Beeching J.R. Protein Seq. Data Anal. 2:463-466(1989). 
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[0871] 296. Histone H2A signature (hisi) 

Histone H2A is one of the four hislones. along with H2B, H3 and H4, which forms the eukaryotic nucleosome core. 
Using alignments of histone H2Asequences J1 ,2.E1) as a signature pattern, a consented region in the N-termkial part 
of H2A. This region is conserved both in classical S-phase regulated H2A's and in variant histone H2A*s which are 
synthesized throughout the cell cycle. 
[0872] Consensus pattern: [ACJ-G-L-x-F-P-V- 

( 1] Wells D.E., Brown D. Nucleic Acids Res. 19:2173-2188(1991). 

1 2] Thatcher T.H.. Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 

[0873] Histone H4 signature (his2) 

[0874] Histone H4 is one of the four histones, along with H2A. H2B and H3, which forms the eukaryotic nudeosome 
core. Along with H3, it plays a central role in nucleosome formation. The sequence of histone H4 has remained almost 
invariant in nrx>re then 2 billkjn years of evolution [1 ^El]. The region used as a signature pattern is a pentapeptkJe found 
In positions 1 4 to 1 8 of all H4sequences. It contains a lysine residue which Is often acetylated [2] and a histidine residue 
whk:h is Impltoated In DNA*bindlng [3]. 
[0875] Consensus pattern: Q-A-K-R-H- 

( 1) Thatcher TH., Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 

[ 2) Doenecke D.. Gallwitz D. Mol. Cell. Bkjchem. 44:113-128(1982). 

( 3J EbralWse K.K.. Grachev S.A., MIrzabekov A.D. Nature 331:365-367(1988). 

[0876] Histone H3 signatures (his3) 

Histone H3 is one of the four histones, ak>ng with H2A, H2B and H4, which fonms the eukaryotic nucleosome core. It 
is a highly conserved protein of 1 35 amino acid residues [1 .24E1].The following proteins have been found to contain 
a C-terminal H3-like domain: - Mammalian centromerk: protein CENP-A [3]. CouW act as a core histone necessary for 
the assembly of centromeres. - Yeast chromatin-associated protein CSE4 [4]. - Caenorhabditis elegans chromosome 
111 encodes two highly related proteins {FS4C8.2 and F58A4.3) whose C-temninal section is evolutionary related to the 
last 100 reskJues of H3. The functkxi of these proteins Is not yet known. Two signature patterns were developed, The 
first one corresponds to a perfectly conserved heptapeptkJe in the N-terminal part of H3. The second one is derived 
from a consen/ed regbn In the central sectbn of H3. 
[0877] Consensus pattern: K-A-P-R-K-Q-L- 
Consensus pattern: P-F-x-{RA]-L-[VA]-[KRQ]-[DEG]-[IV]- 

[ 1J Wells D.E., Brown D. Nucleic Acids Res. 19:2173-2188(1991). 

[ 2] Thatcher TH.. Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 

[ 3] Sullivan K.R, Hechenberger M.. Masri K. J. Cell Btol. 127:581-592(1994). 

[ 4] Stoler S., Keith K.C., Curnick K.E.. Fitzgerald-Hayes M. Genes Dev. 9:573-586(1995). 

[0878] Histone H2B signature (his4) 

[0879] Histone H2B Is one of the four histones. atong with H2A, H3 and H4. which forms the eukaryotic nucleosome 
core. Using alignments of histone H2Bsequences [1,2,E11. a consewed regran was selected in the C-terminal part 
ofH2B. 

[0880] Consensus pattem: [KR]-E-[LIVMHEQ]-T-x(2)-[KR]-x-[UVM](2)-x-[PAGHDEl-L- x-IKRl-H-A-lLIVMHSTAJ- 
E-G- 

[ 1) Wells D.E., Brown D. Nucleic Acids Res. 19:2173-2188(1991). 

1 2] Thatcher TH.. Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 

[0881] 297. 'Homeobox* domain signature and profile (homel ) 

The 'homeobox' is a protein domain of 60 amino acids (1 to 5,E1J first identified In a number of Drosophila homeotic 
and segmentation proteins. It has since been found to be extremely well conserved in many other animals. Including 
vertebrates. This domain binds DNA through a hellx-turn-helix type of structure. Some of the proteins whch contain a 
homeobox domain play an Important role in developnnent. Most of these proteins are known to be sequence specific 
DNA-binding transcription factors. The homeobox domain has also been found to be very similar to a region of the 
yeast mating type proteins. These are sequence-specific DNA-binding proteins that act as master switches In yeast 
differentiation by controlling gene expresston in a cell type-specltic fashion. A schematic representation of the home- 
obox domain Is shown below. The hellx-turn-helix region is shown by the symbols 'H* (for helix), and 1* (for turn). 
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hems-binding domain of cytochrome 65 family. 

C0865I Consensus pattern: lFYHUVI«liq.x(2)-H-P-iGAJ^ [H is a heme axial ligandh 

[llOzols J. Biochim. Biophys. Acta 997:121-130(1989). 
[2) Guiard B. EMBO J. 4:3265-3272(1985). 

[3J Calza R, Huttner E.. Vincentz M.. Ftouze P.. Galangau F.. Vaucheret H.. Cherel I.. Meyer C. Kronenberoer J 
Caboche M. MoL Gea Genet 209:552-562(1987). ^ronenoerger j.. 

[41 Crawford NM. Smith M.. Bellissimo D.. Davis R.W. Proc. Natl. Acad. Scl U.S.A 85:5006-5010(1988) 
(5) Guiard B.. Lederer F. Eur. J. Biochem. 100:441-453(1979) 

16] Levin RJ.. Boychuk P.L. Croniger CM.. Kazzaz J.A.. Rozek C.E. Nucleic Acids Res. 17:6349-6367(1989). 
[08661 294. Hexapeptide-f epeat containing-transf erases signature 

^"^^.^^'^^^^-^''"^^i'^^^^es have been proposed[1.2.3.4]to6elon9toa single family 
^TJif 'f"T ^'^'y"^^'^^^^ (EC 2.3.1.30) (SAT) (gene cysE). an enzyme invohr^ i„ c^Lu^Z 

synm^is. - Azotobacter chroococcum nitrogen fixation protein nif R NifP is most probably a SAT rwolve?in the ol 
nwat«n of "rtrogenase act,vrty. - Escherichia coli thiogalactoside acetyltransferase (EC 2 3. 1.18) (gene la^Tl^l 
zyme involved m the biosynthesis of lactose. - UDP-N-acetylglucosamine acyltransferas^^s 129) Se toxT) 
an en^me involved in the biosynthesis of lipid A a phosphorylated glycofi^d that anchors thT^^SS 
t^^ r ^"^^ "^ • UDP-3<H3-hydroxymyristoyl) glucosamine N-acyttransferaselic 2 3 M^ne 

?fi %.u ^- T'^k" '^^"^ "^^'^^^'^ o' '*P*'l ^ ■ ChlorampheniL acetyltransferase (CAT^^C 
gMM) f'om Agrobaclenum tumefaciens. Bacillus sphaericus. Escherichia coH plasmid IncRI NR79 Pseudomonas 

If l ^lS^^ i." ^ ^ ^^ '^^ acetyltransferase involved in the O^cetylation 

t°LiSi^^C 2^lT7^^ ^TT^rr - tetrahydrodipicoimate N-sulin^ 

tiBnsterase (EC 2JJJ17) (9er>e dapD) which catalyzes the fourth step in the biosynthesis of diaminooimelate and 

SSeilS^T rf"'"'"' • ^"'"^^ N-«cetylglucosamine-1-phosphate%«ransfe^7?SlS 
?™. ^ °' ^ ^"^^ '"^^"^^ " P^Pt«09lycan and BpopoVsaccharide biosynthesis - St^SS 
ecus aureus protein capG which is involved in biosynthesis of type 1 capsu4r polysaccharide - Ytost ^^«!!^ 

- Memanococcus jannaschi. hypotheticat protein l«J1064.These proteins have been shown (3 4] to wnte n a reoSt 
Z'^^^ITT^ °! '^"^"^ °' ^ ^'^■''^''^ f^e^epme which, in the tertiary s r^il^rfsT tes 

pop^dt^ ^ ^'^^ "^"^ ^'9^'"'^ is based on a Zurfold repeat Smih^a! 

[G?s>xSyM;;x;r^^^^^^^^ 

( IJ Oownie J.A. Mol. Microbiol. 3:1649-1651(1989). 

( 2J Parent R, Roy RH. J. Bacteriol. 174:2891-2897(1992). 

( 3) Vaara M. FEMS Microbiol. Lett. 97:249-254(1992). 

[ 4] Vuorio R., Haerkonen T, Tolvanen M., \faara M. FE8S Lett. 337:289-292(1994) 
1 5J Raetz C.R.H., Roderick S.L Science 270:997-1 OOWiflBS) 

10868] 295. Hexokinases signature. Hexokinase (EC 2.7.1.1) [i .2) is an important gVcolytic enzyme that catalvzes 
me phosphorytetion of keto- and aldohexoses (e.g. glucose, mannose and Ir^tose) uLg Mgl??as ^e ph^^r, 
donor, invertebrates there are four major isoenzymes, commonly referred as types i II. Ill Ty^e ^ he^SSl 
wh«h IS Often incorrectly designated glucoklnase [3]. is only expressed in liver and ^^cr^ beta<efls arS^S^^^ 

TnTI T.'" ""^"1}'"^ " ^ P^'^*" « "^^^^ mass of ^ 50 Kd^lxoWn^s Xe^ 
I to III. which have low Km values for glucose, have a molecular mass of about 100 Kd StnrcturX the^fJof^ 
very s„,a,, N-terrninal hydrophobic membrane-bindhg domain followed by two high^ s^Sr^lLs of Xslis 
The first domain has tost its catalytic activity and has evolved Into a regulatory doma^ In yela^here are Ze dffl«^. 

=;rsfr^:5?K7ri^' '"^^^-^ GLKn«re ;rS?r 

molecular mass of about 50 Kd. All these enzymes contain one (or two in the case of types I to III isozymestetronolv 
conse^ed region which has been sho«. (4J to be Involved in substrate binding. A paUem from thaU^iSiTeen 

P»870J ( 1 Middteton RJ. Biochem. Soc. Trans. 18:180-183(1 990).[ 2J Griffin L.D.. Gelb B D Wheeler D A Davison 

§c;i Si^s^Sgn f 4%?hrc:D 

OCT. iD.^di ^ti^{w^^).[A] Schirch D M., Wilson J.E. Arch. Biochem. Biophys. 254:385-396(1987). 
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ROKl , a yeast protein. - stel3, a fission yeast protera - \fesa, a Drosophila protein important for oocyte foimation and 
specification of embryonic posterior stmctures. - Me31B, a Drosophila matemaJly expressed protein of unknown func- 
tion. - dbpA, an Escherichia coli putative RNA heUcase. - deaD, an Escherichia coli putative RNA helicase which can 
suppress a mutation in the rpsB gene for ribosomal protein S2. - rhIB, an Escherichia coll putative RNA helicase. - 
rhIE, an Escherichia coll putative RNA helicase. - srmB. an Escherichia coli protein that shows RNA-dependent ATPase 
activity. It probably interacts with 23S ribosomal RNA. - Caenorhabditis elegans hypothetical proteins T26G10.1. 
ZK512.2 and 2K686.2. - Yeast hypothetical protein YHR065c. - Yeast hypothetical protein YHR169w. - Fission yeast 
hypothetical protein SpACSl A2.07c. - Bacillus subtllis hypothetical protein yxiN. All these proteins share a number of 
consented sequence motifs. Some of them are specific to this family while others are shared by other ATP-binding 
proteins or by proteins belonging to the helicases 'superfamil/ (4,E1]. One of these motifs, called the 'D-E-A-D-box'. 
represents a special version of the B motif of ATP-binding proteins. Some other proteins belong to a subfamily which 
have His instead of the second Asp and are thus said to be 'D-E-A-H-box* proteins I3.5.6.E1). Proteins currently known 
to belong to this subfamily are: - PRP2, PRP16, PRP22 and PRP43. These yeast proteins are an involved in various 
ATP-requirlng steps of the pre-mRNA splicing process. - FIssfon yeast prhl . whfch my be involved in pre-mRNA splfcing. 

- Male-less (mie). a Drosophila protein required in males, for dosage compensation of X chromosome linked genes. - 
RAD3 from yeast. RAD3 is a DNA helfcase involved in excision repair of DNA damaged by UV light, bulky adducts or 
cross-linking agents. Fission yeast radl5 (rhpS) and mammalian DNA excision repair protein XPD (ERCC-2) are the 
homobgs of RAD3. • Yeast CHLl (or CTF1), whk:h is important for chromosome transmission and normal celt cycle 
progression in G(2)/M. - Yeast TPS1. - Yeast hypothetical protein YKL078w. - Caenorhabditis elegans hypothetical 
proteins CX)6E1 . 10 and K03H1 .2. - Poxviruses' early transcriptton factor 70 Kd subunit which acts with RNA polymerase 
to initiate transcriptbn from early gene pronfK>lers. - 18. a putative vaccinia virus helicase. - hrpA, an Escherk:hia coli 
putative RNA helicase. Signature patterns were devetoped for both subfamilies. 

[0861] Consensus pattem: [LIVMF](2)-D-E-A-D-[RKEN]-x-[LIVMFYGSTN]- 
Consensus pattem: [GSAH]-x-[UVMF](3)-D-E-[ALIV]-H-[NECR] - 

Note: proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 'A' (P-loop) (see the relevant 
entry <PDOC00017 

[ 1] SchmkJ S.R., Under P. MoL MicroboL 6:283-292(1992). 

[ 2] Under R. l^sko R. Ashburner Leroy P., Nielsen P.J., Nishi K.. Schnier J., Slonimski RR Nature 337 121-122 
(1 989). 

[ 3) Wassarman D.A., Steitz J.A. Nature 349:463-464(1991). 

[ 4) Hodgman T.C. fslature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 

{ 5) Harosh I., Deschavanne P. Nuciek: AckJs Res. 19:6331-6331(1991). 

1 6] Koonin E.V.. Senkevich TG. J. Gen. Virol. 73:989-993(1992). 

[0882] 293. Heme-binding domain in cytochrome bS and oxidoreductases (heme_1 ) 

[0863] Cytochrome b5 is a membrane-bound hemo protein which acts as an electron carrier for several membrane- 
bound oxygenases [1]. There are two homotogous forms of b5, one found in microsomes and one found in the outer 
membrane of mitochondria. Two consented histidine residues serve as axial llgands for the heme group. The stmcture 
of a number of oxidoreductases consists of the juxtaposition of a heme-binding domain homologous to that of b5 and 
either a flavodehydrogenase or a molybdopterin donriain. These enzymes are: 

- Lactate dehydrogenase (EC 1.1.2.3) [2J, an enzyme that consists of a flavodehydrogenase domain and a heme- 
binding domain called cytochrome b2. 

- Nitrate reductase (EC 1.6.6.1). a key enzyme involved In the first step of nitrate assimilation in plants, fungi and 
bacteria [3,4]. Consists of a molybdopterin domain (see <PDOC00484>). a heme-binding domain called cyto- 
chrome b557, as well as a cytochrome reductase domain. 

- Sulfite oxidase (EC 1.8.3.1) [5], which catalyzes the terminal reaction in the oxidative degradation of sulfur-con- 
taining amino acids. Also consists of a molybdopterin domain and a heme-binding domain. 

This family of proteins also includes: 

TU-36B, a Drosophila muscle protein of unknown function [6]. 
Fission yeast hypothetical protein SpAC1F12.10c. 

- Yeast hypothetrcal protein YMR073c. 

- Yeast hypothetteal protein YMR272c. 



[0864] A segment was used which includes the first of the two histidine heme ligands. as a signature pattern for the 
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[0848] 287. Histidine biosynthesis piotein 

^^I^T^ "T*^ '? "'^ 1^ ^ °' biosynthesis pathway are contained m this lamily. Histidine 

K formed by several complex and dislmct biochemical reactions catalysed by eight enzymes The enzvm^Tl!^ 
Pfam entry are called His6 and His7 in eukaryoles and HisA and HisF in prokaryJes 

[0850] [IJ Fani R. Tamburini E. Mori E. Lazcano A. Lio P. Barberio C. Casalone E Cavalieri D Perito r P«i=in»ni 
M. Gene 1997:197:9-17. [2J Fani R. Lio P. ChiarelB I. Bazzicalupo M. J Mo> B^Ts^^Zai 
[0851] 28a Histone deacetylase family ' ' 

S *^ '^'^^^ acetylated on several Vsine residues. Regulation of transcription is caused in part 

to orerl»Se1Js m '^'^'^^^ '^'^^ '^"^^ «" ^cety. group. Histone deLtylases are rela^ 

[0853] Leipe DD. Landsman D, Nucleic Acids Res 1 997:^:3693-3697. 
[0854] 289. Histidinol dehydrogenase signature 

Histidinol dehydrogenase (EC VL123) (HDH) catalyzes the tem,inal step in the biosynthesis of histidine in bacteria 
ung,. a,Kl pbnts the four-electron oxidation of L-h«tid«ol to histidine.ln bacteria HDH is a single SS pSj^tii^ 
.nfung, rt .s the C-tem,mal domain of a multifunctional enzyme which catalyzes three different ste^hSS 
s^mes.s: and .n plants tt is expressed as nuclear encoded protein precursor which is exported to ^ ^^J^Sst^JT 
As a s^ure pattern a highly consented region located in the central part of HDH was seteded. This r^^s not 

SZ^l^ t"^' " contains a cysteine^Jsi'd^^Ti 2 

r^^r Jf ^''"""""'^ ^ '2] to be important for the catalytic activity of the enzyme 

4^^m^^^y ^" ' ■ ''^ ^ ' ^- J- Natl. Acad. Sci. U.S.A 88: 

[ 2] Gnibm^er C.T. Gray W.R Biochemistry 25:4778-4784(1 986). 

[0856] 290. Homoserine dehydrogenase signature 

^^^y^a^"^^ (EC VLL3) (HDh) 11.2] catalyzes NAD-dependent reduction of aspartate beta-semial- 
dehyde into homoserine. This reaction is the third step in a pathway leading from aspartate to hcL^erine The teSr 
^rt«.pates « the biosyn.hes« of threonine and then isoleucine as well as in that of methionine^hTfouri jftS « 
^ a single rt,au, proten as in some bacteria and yeast, or as a bilunctlonal enzyme consisting of an S-tS ~. 
partokinase domain and a C-temr,inal HDh domain as in bacteria such as Escheri^ia coli and in p^s As a^St^I 
pattern, the best consented region of Hdh has been selected. This is a segment of 23 to 24 "Si^fS: 
central section and that contains two conserved aspartate residues 

10857] Consensus pattern: A-x(3)-G-[LIVMFYHSTAGhx(2.3HDNS]-P-x(2)-D4UVMJ-x-G- x-D-x(3)-K- 

( 1] Thonnas D., Barbey R.. Surdin-Kerjan Y. FEBS Lett 323:289-293(1993) 
[ 2J Cam! B.. Clepet C, Parte J.-C. Biochimie 75:487-495(1993). 

[0858] 291 . haloacid dehalogenase-like hydrolase 

lT^^i JS^'T'I^ ^''"""""'^^^f «he alpha/ beta hydrolase family (abhvdrolaseV This family Includes 

L-2.hal^cid dehalogenase. epoxide hydrolases and phosphatases. The stmctu re of the family consIS ol So *! 

^::iS:f''^rp\"S?";hTe^^^ ^ '^T' '^^'^^ ^^^^ ^ a.ignmenrS:?en rdut 

[0860] 292. DEAD and DEAR box families ATP<tependent helicases signatures (helicase C) 
t?ri^' ^"^'^"''"^'^ prokaryotic proteins have been characterized [1.2.3] on the basis of their stmctural slmi- 
amy. They all seem to be involved in ATP-dependent. nucleic-acid unwinding. Proteins current^ k^wnto bTtonT^L 
U^family are: - Inniatfon factor elF.4A. Found in eukaryoles. this protein ^ a subunit of a hfgh motecular S 
a^RP^"^"" recognition and the binding of mRNA to ribosomes. It Is an ATP Ven^m RNA^eS 
Ph 0 ? ' "T"' ATP-requiring steps of the pre-)nRNA splicing pro^' 

- PI10 a mouse protein expressed speciflcally during spermatogenesis. - An3. a Xenopus putative RNA te!^' 

iSt^P 10 ' Caenorh^t'f T.""^'- ^^^^ ^'""^ '""^'^ ^^^^ Jr^trs^g^nd 

^^^^ ^"^'^ • ^ f*"^^ '^^"^ H'itochondrial sprk:lng 

ATPase and DNA-helicase activities in vitro. It is involved In cell growth and division - Rm62 'b62^ a DrnLnhrb 
putat|ve RNA helicase related to p68. - DBP2. a yeast protein related to p68. - DHHr a yeastT^;?^. DRS1 Tv^S 
prote. .volved in ribosome assembly - MAK5. a yeas, protein invoh^^li i„ mainten^riTJirRNA kSer plasi" 
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section of these proteins; the two others on conserved regions located In the central part of the sequence. 
[0838] Consensus pattern: [IV]-D-L-G'T-[ST}-x-(SCJ - 

Consensus pattern: [LIVMF^^LIVMFY]^DN^[LIVMFS]-G^GSH^[GSJ^ASTJ-x(^)^STHU^^^UVMFCJ- 
Consensus pattern: lUVMY]-x-[LIVMF]-x-^-G-x-[ST>x4LIVI^:P-x-(LIVI^-x-[DEQKRST^^^ 

[ 1) Lindquist S.. Craig E.A. Annu. Rev. Genet. 22:631-677(1988). 
[ 2] Pelhann H.R.B. Cell 46:959-96109861 

[ 3] Pelham H.R.B. Nature 332:776-77(1 988). ( 4] Craig EA BioEssays 11:48-52(1989). 

[ 5] Agranovsky A.A., Boyko VP., Karasev A.V.. Koonin E.V, Doija V.V. J. Mol. Biol. 217:603610(1991) 

( 6J Gupta R.S.. Singh B. J. Bacteriol. 174:4594-4605(1992). 

[ 7) Deshales R.J., Koch B.D., Scheknnam R. Trends Biochem. Sci. 13:384-388(1988). 

[ 81 Craig E.A., Gross C.A. Trends Biochem. Sci. 16:135-140(1991). 

[0839] 283. Heat shock hsp90 proteins family signature 

Prokaryotic and eukaryotic organisms respond to heat shock or other environmental stress by the induction of the 
synthesis of proteins collectively known as heat-shock proteins (hsp) [1]. Amongst them is a family of proteins, with 
an average molecular weight of 90 Kd. known as the hsp90proteins. Proteins known to belong to this family are: - 
Escherichia coli and other bacteria heat shock protein c62.5 (gene htpG). - Vertebrate hsp 90-alpha (hsp 86) and hsp 
90-beta (hsp 84). - Drosophila hsp 82 (hsp 83). - Trypanosoma cruzi hsp 85. - Plants Hsp82 or Hsp83. - Yeast and 
other fungi HSC82. and HSP82. - The endoplasmk: retculum protein 'endoplasmW (also known as Erp99 in mouse, 
GRP94 m hamster, and hsp 1 08 in chk:ken).The exact function of hsp90 proteins is not yet known. In higher eukaryotes. 
hsp90 has been found associated with steroW homrtone receptors, with tyrosine kinase oncogene products of several 
retroviruses, with elF2alpha kinase, and with actin and tubulin. Hsp90 are probable chaperonins that possess ATPase 
activity [2.3].As a signature pattern for the hsp90 family of proteins, a highly consented regbn found in the N4erminal 
part of these proteins was selected. 

[0840] Consensus pattern: Y-x-(NQHl-K-[DE]-[l VAJ-F-L-R-JED] - 

( 1) Lindquist S., Craig E.A. Annu. Rev. Genet. 22:631-677(1988). 

[ 2) Nadeau K., Das A., V\telsh C.T J. Biol. Chem. 268:1479-1487(1993). 

I 3] Jakob U., Buchner J. Trends Biochem. Sci. 19:205-211(1994). 

[0841] 284. Helix-turn-helix (HTH3) 

[0842] This large family of DNA binding helix-turn helix proteins includes Cro Swiss:P03036 and CI Swiss: P03034 
[0843] 285. Heme oxygenase signature '' ' 

Heme oxygenase (EC 1.14.99.3) (HO) [1 ] is the microsomal enzyme that, in animals, carries out the oxidation of heme, 
it cleaves the heme ring at the alpha methene bridge to \orm biliverdin and carbon monoxide. Biliverdin is subsequently 
converted to bilirubin by biliverdin reductase. In mammals there are three isozymes of heme oxygenase: HO-1 to HO- 
3. The first two isozymes differ in their tissue expression and their inducibility: HO-1 is highly inducible by its substrate 
heme and by vartous non-heme substances, while HO-2 is non-hduciWe. It has been suggested [2] that HO-2 could 
be implicated in the productkjn of carbon monoxkie in the brain where it is sakJ to act as a neurotransmitter. In the 
genome of the chloroplast ol red algae as well as in cyanobacteria, there is a heme oxygenase (gene pbsA) that is the 
key enzyme in the synthesis of the chronrK>phoric part of the photosynthetic antennae [3). An heme oxygenase is also 
present in the bacteria Corynebacterium diphtheriae (gene hmuO). where it is involved in the acquisitton of iron from 
the host heme l4).There is. in the central sectton of these enzymes, a well conserved regton centered on a histldine 
residue which is proposed to play a key role in binding the substrate heme at the active center of the enzyme. This 
region was used as a signature pattem. 

[0844] Consensus pattem: L4IV]-A.H-[STACHI-Y-[STV1-[RT]-Y-[LIVM]-G |H binds the heme) 

[ 1] Malnes M.D. FASEB J. 2:2557-2568(1988). 
[ 2) Barinaga M. Science 259:309-309(1993). 

[ 3] Richaud C, Zabulon G. Proc. Natl. Acad. Sci. U.S.A 94:11736-11741(1997). 
[ 4] Schmitt M.P. J. Bacteriol. 179:838-845(1997). 

[0845] 286. Hepatitis core antigen. 

[0846] The core antigen of hepatitis viruses possesses a carboxyl terminus rich in arginine. On this basis it was 
predicted that the core antigen would bind DMA [1]. There is some experimental evidence to support this [2) 
[0847] (1J Pasek M. Goto T Gilbert W. Zink B, Schaller H. Mckay P. Leadbetter G. Murray K; Nature 1979-282- 
575-579. [2] Gatlina A, Bonellr F, Zentilin L. Rindi G, Muttini M, Milanesi G; J Virol 1989;63:4645-4652. 
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[0831 1 280. HSF-type DNA-binding domain signature 

^2] Heat shocit factor (HSF) is a DNA-binding protein that specifically binds heat shock promoter elements (HSE) 
HSE IS a palindromic element rich with repetitive purine and pyrimidine motifs: 5'-nGAAnnTTCnnGAAnnTTCn-3' HSF 
IS expressed at nomnal temperatures but is activated by heat shock or chemical stressors (1 ^J. The sequences trf HSF 
from vanous species show extensive similarity in a region of about 90 amino acids, which has been shovwi [31 to bind 
DNA. Some other proteins also contain a HSF domain, these are: - Yeast SFL1. a protein involved in cell surface 

^.T/^ffCi,'*^"™"' mi?!"* '^^^'^ ^ flcxxsulation (asexual cell aggregation) (4). - Yeast transcriptk)n factor 
SKl\r7 (or BRY1 or POS9). which binds to the promoter elements SCB and MCB essential for the control of Gl cvcfins 

m'^^TJ^I' ^!,^' ' ^^yfx^^^^i Pfotein YJR147W A pattern from the most consenred part of the 

MSP DNA-bindmg domain was derived, its central regkxi. 

[0833J Consensus pattern: L-x(3HFYhK44-x-N-x4STANl-S-F4U\fllAhR0^.4NHhx-Y-x4FYWJ4RKH)-K4^^ 

( 1J Sorger P.K. Gell65:363-366(l99n. 

[ 2J Mager W.H., Moradas Ferreira P. Bkxhem. J. 290:1-13(1993). 

( 3] Vuister G.W., Kim S.-J., Orosz A. IVIarquardt J., Wu C. Bax A Nat. Struct Bk>l. 1:605-613(1994) 
1 4J Fujita A. Kikuchi Y. Kuhara S.. MIsumi Y. Matsumoto S., Kobayashi H. Gene 85-321-328(1989) ' 
1 5] Morgan B.A. Bouquin N.. MerrUI G.F.. Johnston LH. EMBO J. 14:5679-5689(1995). 

[0834] 281 . Heat shock hsp20 proteins family profile 

Prokaryotb and eukaryotc organisms respond to heat shock or other environmental stress by inducing the synthesis 
of proteins collectively known as heat-shock proteins (hsp) (1 ). Amongst them is a famBy of proteins with an averaoe 
rnolecular weight of 20 Kd. known as the hsp20 proteins (2 to 5J. These seem to act as chaperones that can protS 
otherproteinsagainstheat-induceddenatu.atk)nandaggregation.Hsp20 proteins s^ 

aggregates; theirfamilyiscurrentlycomposedofthefollowing members: - Vbrtebiate heat shock protein hsp27 ff,sp25) 

I'lfn't^^rf "^^^L^'T"^*^'^^^^*- • D'osophib heat shock proteins hsp22. hsp23, hsp26. hsp27, hsp67BA 
and BC. - Caenorhabditis elegans hspIS multigene family. - Fungal HSP26 (budding yeast) and hsp30 (NeuiosDoia 
crassa and Aspergillus Nidulans). - Plant small hsp's. Plants have four classes ol tSp20: classesl^ || vSS1« 
qrtoplasmic Class III which Is chloroplastic and class I V whk*> is found in the endomembrane. - AlphaK:«siallin A and 
B chains Alpha-crystallln is an abundant constituent of the eye lens of most vertebrate species. Its main functwn 
appeare to be to maintain the correct refractive index of the lens. It is also found in other tissues where it seems to act 
Z^l^T^T^-^l' 'naio' egg antigen p40. Stnicturally. p40 is built of two tandem hsp20 

domains. - A variety of prokaryotic proteins: ibpA and ibpB from Escherichia coll. hspIS from Ctoslridiom acetobuWB- 
curn. spore P-^ein SP21 (hspA) from Stigmatella aurantiaca. Mycobacterium leprae 18 Kd antigen and MycobacterZrl 
l..!Z^K^ K ^" " jannaschii hypothetical protein MJ0285. Structurally, this family is char- 

acterced by the presence ol a conserved G-tem(.inal domain of about 100 residues. The profile developed to detect 
members of the hsp20 family is based on an alignment of this domaia eveiopeo lo oetect 

[0835J -Sequences known to bekmg to this class detected by the profile ALL 

[i/^V'^"'? f "• ^ 22:631 -677(1 988).[ 2J de Jong W.W.. Leunlssen J AM Voorter C E 

. ^f=^03-^26(1993)t3] Caspers GJ.. Leun«sen J.AM., de Jong W.W. J. Mol. Evoi. ^2^2^ 

S r ■ 3^234-235(1993).(5) Jakob U.. Buchner J. Trends Biochem. Sci.ll 

^"S '?«p Lit • ^"'^ - ^ Btoemendal H. Eur. J. Bkx:hem. 225:1-9(1994). 

[0836] 282. Heat shock hsp70 proteins family signatures 

[OWp Prokaryotk: and eukaryotk: organisms respond to heat shock or other environmental stress by the induction 
Of the synthes« of proteins collectively known as heat-shock proteins (hsp) [1). Amongst them is a farJily of pr«e^ 
Z^Tr, "^^f' °' '° "^^ hsp70prote^s 2 3.4]. In ^ost specii.Terare^Z 

hlnyn ,r I ? ^^""'-..-'^P«'«"'e"t« (""clear. cytosol«. mitochondrial, endoplasmic retteulum. etc.? SorJe 
al m?^K T- ^'^""^'1'^'^ """"^ ■ ^^^"^^ ^' and other bacteria, the main hsp70 protein is known 

V^'^ *^ *^ ^^^'^ dnaK is also found in the chloroplast 

IsoTfK^RafSl ',^^T- ^' known to exist: SSAI toSSA4. SSB1. SSB2. SSC1. 
iS J^ni f Lii^ T ■ "^^'^ ^' *««^«"« »^^P70 proteins: HSP70 

r RDTp'/ . ^ ! ■ '^^'^ a< 'east eight different proteins: HSPA1 to HSPA6. HSC70 and 

^TJ,7nt^ r2 ^'k ''^^ ^^'^ ^9 protein (BiP)). - In the sugar beet yeltow virus (SBYV) 

a hsp70 homotog has been shown (5J to exist. - In archaebacteria. hsp70 proteins are also present {61 All proteins 
belonging to the .^p70 family bind ATP A variety of functions has been postulated for hsp70 proteins I now app^rs 
S !!! hsp70pro.eins play an important role in the transport of proteins across membranes. They ato seem to 
.h JhcnT. ^! r ""^'"^ ^"^ " "'^ assembly/disassembly of protein complexes [SJ. Three signature patterTfw 
the hsp70 family of proteins were derived; the first centered on a conserved pentapeptkte found in theX^inS 



107 



EP 1 033 405 A2 



of structural genes for phosphorus acquisition. - Fission yeast protein esci which is involved in the sexual difTerenttation 
process. The schematic representation of the helix-loop-helix domain is shown here: 

xxxxxxxxxxxxxxxxxxxxxxxx ^xxxxxxxxxxxxxxxxxxxxxxx Amphipathic helix 1 Loop Amphipathic helix 2. 

The signature pattern developed to detect this domain spans completely the second amphipathic helix. 

[08181 Consensus pattern: (DENSTAPHKTR]-[LIVMAGSNT]-{FYWCPHKR}-[LIVMT|-[LIVM)- x(2)-isTAVl-[LIVM. 

STACKRl-x-tVMFYHHLIVMTA]-iPh{P}-jUVMRKHQl.- 

[ 1] Murre C, McCaw RS., Baltimore D. Ceil 56:777-783n 989). 
[ 2) Garret J.. Campuzano S. BioEssays 13:493-498(1991). 
[ 3] Kato G.J.. Dang C.V. FASEB J. 6:3065-3072(1992). 

[ 4) Krause M., Fire A.. Harrison S.W.. Priess J., Weintraub H. Cell 63:907-919(1990^. 
i 5] Riechmann V.. van Cruechten I., Sablitzky F. Nucleic Acids Res. 22:749-755(1994). 

[0819] 276. HMG14 and HMG17 signature 

High mobility group (HMG) proteins are a family of relatively low molecular weight nonhistone components tn chromatin. 
H MG 1 4 and HMG1 7 [1 ]. two related proteins of about 1 00 amino acid residues, bind to the inner side of the nucleosomal 
DNA thus altering the interaction between the DNA and the histone octamer. These two proteins may be involved in 
the process which maintains transcribable genes in a unique chromatin conformation. The trout nonhistone chromo- 
somal protein H6 (histone T) also betongs to this family. As a signature pattern a conserved stretch of 10 residues 
located in the N-terminal section of HMG14 and HMG17 was selected. 
[0820] Consensus pattern: FVR-S-A-R-L-S-A-IRK)-P- 

[0821] [ 1] Bustin 1^.. Reeves R. Prog. Nucleic Acid Res. Mol. BioL 54:35-100(1996). 
[0822] 277. Hydroxymethylglutaryl-coenzyme A lyase active site (HMGL1 ) 

3-hydroxy-3-methylgIutaryl-coenzyme A lyase (HMG-CoA lyase or HL) (EC 4.1.3.4 )catalvzes the transformation of 
HMG-CoA into acetyl-CoA and acetoacetate. In vertebrates it is a mitochondrial enyme which is involved In ketogenesis 
and in leucine catabolism fl]. In some bacteria, such as Pseudomonas mevalonii, it is involved in mevalonate catab- 
ofem (gene mvaB). A cysteine has been shown(2]. in mvaB, to be required for the activity of the enzyme. The region 
around this residue Is perfectly conserved and is used as a signature pattem. 
[0823] Consensus pattem: S-V-A-G-L-G-G-C-P-Y [C is the active site residue]- 

[ 1] Mitchell G.A., Robert M.-F., Hruz RW.. Wang S., Fontaine G., Behnke C.E., fWende-Mueller L.M.. Schappert 
K., Lee C, Gibson K.M., Mizbrko H.M. J. Bbl. Chem. 268:4376-4381(1993). 
[ 2) Hruz P.W., Narasimhan C, Miziorito H.M. Biochemistry 31:6842-6847(1992). 

[0824] Alpha-isopropylmalate and homocitrate synthases signatures (HMGL2) 

The following enzymes have been shown [1 ] to be functionally as well as evolutbnary related: - Alpha-isopropylmalate 
synthase (EC 4.1.3.12 ) which catalyzes the first step in the biosynthesis of leucine, the condensatk>n of acetyl-CoA 
and alpha- ketoisovalerate to form 2-lsopropylmalate synthase. - Homocitrate synthase (EC 4.1.3.21 ) (gene nifV) which 
is involved in the biosynthesis of the iron-molybdenum cofactor of nitrogenase and catalyzes the condensatbn of 
acetyl-CoA and alpha-keloglutarate Into homocitrate. - Soybean late nodulin 56. - Methanococcus jannaschii hypo- 
thetical proteins MJ0503. f^1 1 95 and MJ1 392. Two consen/ed regions were selected as signature patterns for these 
enzymes. The first regbn is located in the N-terminal section while the second region is located In the central section 
and contains two conserved histidine residues whbh could be implicated in the catalytic mechanism. 
[0825] Consensus pattem: L-R-[DEJ-G-x-Q-x(10)-K- 
Consensus pattern: [LIVMFW]-x(2)-H-x-H-[DN]-D-x-G-x-[GAS]-x-[GASLIl- 
[0826] [ 1] Wang S.-Z, Dean D.R.. Chen J.-S.. Johnson J.L J. Bacterbl. 173:3041-3046(1991). 
[0827] 278. (HMG COA synt) Hydroxymethylglutaryl-coenzyme A synthase active site Hydroxymethylglutaryl-coen- 
zyme A synthase (EC 4.1.3.5 ) (HMG-CoA synthase) catalyzes the condensation of acetyl-CoA with acetoacetyl-CoA 
to produce HMG- CoA and CoA [ 1 ]. In vertebrates there are two isozymes located in different subcellular compartments: 
a cytosolic form whbh is the starting point of the mevatonate pathway which leads to cholesterol and other sterolic and 
isoprenoid compounds and a mitochondrial form responsible for ketone body biosynthesis. HMG-CoA is also found in 
other eukaryotes such as insect, plants and fungi. A cysteine is known to act as the catalytic nucleophile in the first 
step of the reactbn, the acetylation of the enzyme by acelyl-CoA. The consen/ed regbn was used around this active 
site residue as a signature pattem. 

[0828] Consensus pattem: N-x-[DNHIVl-E-G-llV]-D-x(2)-N-A-C-[FY]-x-G (C is the active sHe residue]- 

[0829] [ 1] Rokosz L.L.. Boulton D.A., Butkiewicz E.A., Sanyal G., Cueto M.A.. Lachance PA., Hermes J.D. Arch. 

Bbchem. Biophys. 312:1-13(1994). 

[0830] 279. HMG (high mobility group) box 
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Huebner K. Biochemistry 35:11529-11535(1996). 

"^"^ ° ' ^ ° - ^"ste*" J M. Nat. Sfruct. Bid. 4: 

[081 71 275. Myc-type. 'helix-loop-helix' dimerizatlon domain signature (HLH) 

A number of eukanrotc proteins, which probably are sequence specific DNA-binding proteins that act as transcription 

Sl^;^h ^Jtrr.""^ K°' "^'".^^^ » has been proposed [1 ) that this domain is fonned 

dtwo aovhvathc helves jomed by a variable length linker region that could form a loop. This l,elix-loop-helix' (HLH) 
doman mediates protein dimerization and has been found in the proteins listed below [2.3£1 .E21 M^f these oro- 

to HM^ ^ 'f^^ ' ^ '^'^""^ '^^^ ^i^^"' »° 'he HLH ^ and specifkally 

? r ^^'^^^'^ basic helix-loop4,elix proteins (bHLH). and are classified in two groups- class A 

(ubiqurtou^andclassB(tissue-specinc). Members of thebHLH family bind variationso^ 

also relerred to as the E-box motif . The homo- or helerodimerization mediated by the HLH domaiTfe independent^ 

me T '"iS?* '^'^ '^"^ ^'^^ '^^^'S activity. The HLH p^te'^s i^g 

me base domain Emc Id) tonct«n as negative regulators since mey fom, heterodimers. but tail to bind DNA. Z 

m^'^K J^?' '"P^^ transcription although they can bind DNA. T^e pT^inTcJ 

h« sub amily act together wrth cc^repressor proteins, like g«Hicho. mrough meir CMerminal motif WRPW - The mvc 
family of cellular oncogenes [4]. which is currently known to contain four members: c-myc (E3J. N^yc. L-myc. and I 
myc. The myc genes are mought to play a role in cellular diflerenfiatkxi and proliferat»n. ^oteins involved in mvo- 
9enes« (me inductioj, o« muscle cells). In mammals MyoDi (Myf-3). myogenin (Myf^). Myl-5. ^ ^^(LT^ 
hercuin) m biids CMD1 (QMF-1 ). in Xenopus MyoD and MF25. in Caenortabditis elegans CeLoD and in DrosoSiS 

f^ti^itw 1^ . T ^ll^P^:^^'^ E47/pan-1). ITF-2 (tcf4). TFE3. and TFEB. - Vertebrate neu^nfc differen- 
tiation factor 1 that acts as differentiation factor during neurogenesis. - Vfertebrate MAX protein, a transcript regulator 
urlS'^ro^^l^rr «^ific DNA-binding protein complex with myc or mad. - Nfe^ebrate Max InteracTnXt^ 
1 (MXI1 P^tein Which acts asatranscriptional repressor andmayantagonizemyc transcriptional ac 

SS^iu '"'^ ^"^>' hypoxia-inducible factor 1 alpha (HIF1 A), m JeZ 

(AHR). neuronal pas domain prote^is (NPAS1 and NPAS2). endomelial pas domain protein 1 (EPASI) mouse fi^m2 
ar^ human BMAL,. In drosc^hila. single-minded (SIM). AH receptor nrteartranskLtor (AfUft aS^T^ti 
(TRH). and sjmiter protem (SIMA). - Mammalian transcriptton factors HES. whfch repress transcription by acZ c^^ 
h^pes Of DNA sequences, me E box and me N box. - MammaHan MAD protein (max dimerizerrwhih iSs Ts 
tmnscnptKjnalrepressorandmayantagonizemyctranscriptionalactlvitybyw^^^^ 

S^ula^ary Factor 1 and 2 (USFl and USF2). which bind to a symmetrical DNA sequence that feTu^ra^^yS 
viralandcellular promoters. - Human lyM protein; which is involved, by chromosomal transkx:at»a in T^Ztemi^ 
- Human transcnptK,n factor AP-4. - Mouse helix-loop-helix proteins MATH-1 and MATH-2 which iLmo E SX 
pendent transcription in collaboration with E47. - Mammalian stem cell protein (SCL) (also known as tall) a oroTeti 
Which may ptey an important role in hemopoietic differentiation. SCL is im«,lved. by chromos^HristoiZ n 
^tern-cell leukemia. - Mammalian proteins Idl to .d4 (5). Id (inhibitor of DNA binding) ^eins l^^sroSSna 
doma« but are able to fonn hele«dimers with omer HLH proteins, mereby inhillg bindSg to ml T^Ste 
ex^rannacrochaetee (emc) protein, which participates in sensoiy organ patterning b? antag^^ing me nl^t 
actMty Of me achaete- scuto complex. Emc is me homolog of mammalian id pr^ins. - hSZ le^S 
E emen. Binding Prote^ , (SREBP-1). a transcriptionaJ activator mat binds to t£e sterd regurry^te^f? (S^E^^ 
1) found in me flanking region of the LDLR gene and in omer genes. - Drosophila achaete-scutl fASO rnmn^Tv 
proteins T3 (I'sc). T4 (scute). T5 (achaete) and T8 (asense). The AS<. proteSe ^2^.1^: defeSiZ^ d 
the neuronal precursors in me peripheral nen^ous system and me central nervous system - Mamnil^ 2 
ad^ete-scute protehs. the MASH-1 and MASH-2 proteins. - Drosophila atonal protein (ato) S tcSv^^^e^ 

ShrLdo^? TTt^ '''' ""^^ '^^'-^^ and sex<feterSiS. Z 

sophiia deadpan (dpn). a haiiy-like protein involved in me functonal diflerentiattori of neurons - Drosophila delilah 
(de.) protein. wh«h « plays an important role in me diflerentiatfon of epWennal cells into muscie SSSlftetl 
aOprote« a transcriptonal repressor whfch regulates me embryonfc segmentation and adult bristle pa^a^Z 
soph a enhancer of split proteins E(spO. that are hairy-like proteins activVduring neurogenesis a2> ac" aTrr^sS^ 
l.c«al repressors - Drosophila twist (twi) protein, whfch is involved in the estabLmem of ge^ S^rs ^eTrS 
Maize anmocyanm regulatory proteins R-S and LC. - Yeast centromere-binding protein 1 (CPF1 or ciTl^Sin 
IS moh^ed in chromosomal segregatfcn. It binds to a highly conserved DNA sequence found in c^tromeranJi^ 

inT'l TkT'- ■ ^'"'^ '"^^ P""^'^- - ^y^^e"* P^"*ve regulatory pro^ rTHii v^^^ 

nteracts Wrth me upstream activating sequence of several acid phosphatase JTes. - Yeast serin J.rirpr«S^ T5S 
that « required for ty-mediated ADH2 expression. - Neurospora crassa nuc-1 . a protein that activatts me tran^^S 
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[0806] 270, Glyceraldehyde 3-phosphate dehydrogenase active srte (gpdh) 

G Vceraldehyde 3-phosphate dehydrogenase (EC 1.2.1.12) (G APDH) [1 1 is a tetrameric NAD-binding enzyme common 
to both the glycolytic and gluconeogenic pathways. A cysteine in the middle of the nfK>lecule is involved in forming a 
ccvalent phosphoglycerol thioester intermediate. The sequence around this cysteine is totally conserved In eubacterial 
and eukaryotic GAPDHs and is also present, albeit in a variant form, in the othenvise highly divergent archaebacterial 
G APDH [2]. Escherichia coli D-erythrose 4-phosphate dehydrogenase (E4PDH) (gene epd orgapB) is an enzyme highly 
related to GAPDH [3]. 

[0807] Consensus pattern: lASV]-S-C-[NT]-T-x(2HLIM) [C is the active site reslduej- 

[ 1) Harris J.I.. Waters M. (In) The Enzymes (3fd edition) 13:1-50(1976). 

( 2) Fabry S., Lang J.. Niermann T. Vingron M., Hensel R. Eur. J. Biochem. 179:405-413(1989). 

[3] Zhao G., Pease A.J.. Bharani N., Winkler M.E. J. Bacteriol. 177:2804-2812(1995). 

[0808] 27 1 . Granulins signature 

Granulins [1] are a family of cysteine-rich peptides of about 6 Kd which may have multiple biological activity. A precursor 
protein (known as acrogranin) potentially encodes seven different fomns of granulin (grnA to grmG) which are probably 
released by post-translatbnal proteolytic processing. A schematic representation of the stmcture of a granulin is shown 
below: 

xxxCxxxxxCxx)oo(CCxxxxx)oo(CCxxxx)ocCCxxx)ocCC)oooocCxx^ conserved cystehe probably 

involved in a disulfide bond.***: position of the pattern. Granulins are evolutionary related to a PMP-D1, a peptide 
extracted from thepars intercerebralis of migratory locusts [2]. 

[0809] Consensus pattem: C-x-D-x(2)-H-C-C-P-x(4)-C [The four C*s are probably involved in disuffide bonds]- 

( 1) Bhandari V, Palfree R.G.. Batennan A. Proc. NatL Acad. ScL U.S.A. 89:1715-1719(1992). 
[ 2} Nakakura N.. Hietter H., van Dorsselaer A.. Luu B. Eur. J. Biochem. 204:147-153(1992). 

[081 0] 272. (HCV RdRp) Hepatitis C virus RNA dependent RN A polymerase 

[0811] The RNA dependent RNA polymerase Is also known as non-structural protein NS5B. NS5B is a 65 kDa protein 
that resembles other viral RNA polymerases. HCV replicatton is thought to occur in membrane bound replication com- 
plexes. These complexes transcribe the positive strand and the resulting minus strand is used as a template for the 
synthesis of genomic RNA. There are two viral proteins involved in the reaction, NS3 and NS5B.(1 .2] 
[0812] [1] Lohmann V, Komer F. Herian U, Bartenschlager R; 
J Virol 1997:71:8416-8428. [2] Behrens SE, Tomei L. De Francesco R; 
EMBO J 1996;15:12-22. [3] Ishldo S. Fujita X Hotta H; 
Biochem Biophys Res Commun 1998;244:35-40. 
[0813] 273. (HHH) Helix-halrpin-helix motif. 

[0814] [1] Doherty AJ, Serpell LC. Ponting CP; Nucleic Ackte Res 1996;24:24e8-2497 
[081 5] 274. HIT family signature 

Recently a family of small proteins of about 12 to 16 Kd has been descrlbec^l]. This family currently consists of: - 
Mammalian protein HINT (also known as Protein kinase C Inhibitor 1 or PKCI- 1). HINT was incorrectly thought to be 
a specific inhibitor of PKC. It has been shown to bind zinc. - Fisskxi yeast diadenosine 5',5"-P1.P4-tetfaphosphate 
asymmetrrcal hydrolase (Ap4Aase) (EC 3.6.1.17) [2J (gene aphi), which cleaves A-S'-PPPP- 5'A to yiekJ AMP and ATR 
- FHIT, a human protein whose gene is altered in different tumors and which acts [3J as a diadenosine 5',5"-Pl ,P3-tri- 
phosphate hydrolase (Ap3Aase) (EC 3.6.1.29 ) cleaving A-5'-PPP-5»A to yield AMP and ADP - Yeast proteins HNT1 
and HNT2. - Maize zinc-binding protein ZBP14 - Escherichia coli hypothetical protein ycfF - Haemophilus Influenzae 
hypothetical protein HI0961 . - Helicobacter pylori hypothetical protein HP0404. - Methanococcus jannaschll hypothet- 
ical protein MJ0866. - Mycobacterium leprae hypothetical protein U296A - Synechocystis strain PCC 6803 hypothetical 
protein sirl 234. - Caenorhabditls elegans hypothetical protein F21 C3.3. - A hypothetical 1 3.2 Kd protein in hisE 3'region 
in Azospirillum brasilense. - A hypothetical 13.1 Kd protein in p37 5Vegion in Mycoplasma hyorhinis. - A hypothetical 
12.4 Kd protein in psbAII S'regton in Synechococcus strain PCC 7942. All these proteins contains a region with three 
clustered histidines. This region is responsible for the designation of this family: HIT, for 'HIstldineTriad [1]. This region 
was originally thought to be implied in the binding of a zinc k>n but was later kJentified [4] as part of the alpha-phosphate 
binding site of a nucleotlde-binding domain. As a signature pattem. the region of the histidine triad was selected. 
[081 6] Consensus pattem: |NQAl-x(4)-IGAVl-x-(QF]-x-(LI VM]-x-H-(LIVMFYTl-H-ILIVMFT)-H.[LI VMF](2)-(PSG A]- 

1 1) Seraphin B. DNA Seq. 3:177-179(1992). 

[ 2] Huang Y.. Garrison RN., Barnes L.D. Biochem. J. 312:925-932(1995). 

I 3] Barnes L.D.. Garrison PN., Siprashvili 2., GuranowskI A.. Robinson A.K.. Ingram S.W., Croce CM.. Ohta M.. 



EP 1 033 405 A2 



One of me conserved regions In these enzymes is centered on a conserved aspartic acid residue which has been 
shown [3]. m Aspergillus wentii beta-glucosldase A3, to be rmpScated in the catalytic mechanisra This region was used 
as a signature pattern. ^ 

iSsitf ITsall^r ^"^"^ I'^^5(^H*^^W^Q»^-^(4)^HLI VMFT^^ jD is the 

[ 1) Henrissat B. Biochem. J. 280:309-316(1991), 

[ 2) Castle LA., Smith K,D.. Morris RO. J. BaderbL 174:1478-1486(1992). 
[ 3] Bause E., Legler G. Biochim. Biophys. Acta 626:459-465(1980). 

[0803] 268. Glycosyl hydrolases family 8 signature 

The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases (EC 
Mi4). cellobiohydrolases (EC a2i9l)(exoglucanases), or xylanases (EC 3.2.1.8^ [1 ,2). Fungi and bacteria pro- 

on the basis of sequence similarities can 
be c^|fi^ into famihes. One of these families is known as the cellulase family D (3) or as the glycosyl hydrolases 
family 8 14,E1J. The enzymes which are currently known to belong to this family are listed below - Acetobacter xylinum 
erKtonucteasecrncAX - Bacillus strain KSM-SSOacidic endonuclease K (Endc^K). - Cellutomonas josui endoglu«nase 
2 (celB). - Cellulomonas yda endoglucanase. - Cbstricfium celluk>lyticum endoglucanases C (ceteCC) - Clostridium 
thermocellum endoglucanases A (celA). - En«inia chrysanthemi minor endoglucanase y (celY). - Bacillus circulans 
beta-glucanase (EC ^2^73). - Escherichia coli hypothetical protein yhjM. The most conseived region in these en- 
zymes B a stretch of about 20 resklues thai contains two conserved aspartate. The first aspaiatate is thouqht 151 to 
act as the nucleophile in the catalytic mechanism. This region was used as a signature pattern 
2du J"* A-[ST]-D-[AGhD-x(2HIMhA-x-ISAHUVMHLIVMGhx-A- x(3HFW) [The first D is an active site 

[ 1] Beguin P. Annu. Ftev. Microbiol 44:219-248(1990). 

[2] Gilkes N.R. Henrissat B.. Kilbum D.G.. Miller R.C. Jr., Warren RA.J. Microbiol. Rev 55 303-315(1991) 

[ 3] Hennssat B., Claeyssens M.. Tomme P., Lemesle L. Mornon J.-R Gene 81 83-95(1989) 

1 4) Henrissat B. Biochem. J. 280:309-316(1 991 ). 

[ 5] Alzari P.M., Souchon H., Dominguez R Stnjcture 4:265-275(1996). 

[0804] 269. Glycosyl hydrolases family 9 acUve sites signatures 

oJ ceUulose and xylans requires several types of enzymes such as endoglucanases (EC 
cellobiohydrolases (ECaajJl) (exoglucanases). orxylariases (ECMU) [1 .2^ Fungi and bacteria produce 
IS"*. T^A' («=ellulases) and xylanases whk:h, on the basis of sequence similarities, can be 

Q M c f r° ^ ^ '^""'"^^ '^^""'^^ ♦^'""y E °' ^ »he glycosyl hydrolases family 

tj^L enzymes which are currently known to belong to this family are listed below. - Butyrivibrio fibrisolvens 
cenodextrinase 1 cedl). - Cellulomonas fimi endoglucanases B (cenB) and C (cenC). - CtostrkJium cellulolyticurn 
r^^' G (cel^G)^CIostridiumcellutovorans endoglucanase C (engC).'- C JtrkJium stercoaiSurn 
lucanase Z (avicelase i) (celZ). - Cbslndium them^ellum endoglucanases D (celD). F (celF) and I (cell) - Fibrobacter 
sureinogenes endoglucanase A (endA). - Pseudomonas fluorescens endoglucanase A (celA). - St eptomyces reticull 
endoglucanase 1 (cell ) - ThemK,monospora lusca endoglucanase E-4 (celD). - DictyoJteliui di^SZ^e gi? 
r^rZr ? 270-6. This slime moW enzyme may digest the spore cell waO during gemSa^bn to 

release the enclosed amoeba -Endoglucanases from plants such as Avocado or French bean. In plants this enzyme 
may be involved the fruit ripening process. Two of the most conserved regions in these enzymes are centered on 

'^'^J- '^^ endoglucanase D from Cellutomonas themxx:ellum to be 
»nportant or the catalytic activity. The first regton contains an active site histkJine and the second regon contaiiis two 
cata^i>cally important residues: an aspartate and a glutamate. Both regions were used as signature patterns 

T^T^T I?^-''-P-'VM'^-fSTVhx(2)-G-x-(NKRl-x(4HPUVM]-H-x-R [H isan active she residue^ 
Consensus pattera [FYWl-x-D-x(4)-[FYW)-x(3)-E-x-[STAhx(3)-N-[STA] (D and E are active site residues]- 

[ 1J Begun P. Annu. Rev. Mbrobk)!. 44:219-248(1990). 

1 21 Gilkes N.R., Henrissat B.. Kilbum D.G.. Miller R.C. Jr.. Warren R.A.J. Microbiol. Rev. 55 303-315(1991) 
[ 3J Hennssat B.. Claeyssens M.. Tomme P. Lemesle L, Mornon J.-P Gene 81 83-95(1989) 
1 4] Henrissat 8. Biochem. J. 280:309-316(1991). 

(ImT^^ ^" ® ' ^"^'^ J -P- Claeyssens M. J. Biol. Chem. 266:10313-10318 

( 6J Tomme P., van Beeumen J., Claeyssens M. Biochem. J. 285:319-324(1992). 
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[0796] 265. Glycosyl hydrolases family 1 signatures 

It has been shown [1 to 4] that the following glycosyl hydrolases can be, on the basis of sequence sinrtilarities, classified 
into a single family: - Beta-glucosidases (EC 3.2.1.21^ from various bacteria such as Agrobacterium strain ATCC 21 400, 
Bacillus polymyxa, and Caldocellum saccharolyticum. - Two plants (clover) beta-glucosidases (EC 3.2.1.21V - Two 
different beta-galactosidases (EC 3.2.1.23 ) from the archaebacleria Sulfolobus solfataricus (genes bgaS and lacS). - 
6-phospho-beta-galactosidases (EC 3.2.1.85 ) from various bacteria such as Lactobacillus casei, Lactococcus lactis. 
and Staphylococcus aureus. - 6-phospho-beta-glucosidases (EC 3.2^1 .86) from Escherichia cofi (genes bgIB and ascB) 
and from Erwinia chrysanthemi (gene arbB). - Plants myrosinases (EC 3.2.3.1) (sinigrinase) (thioglucostdase). - Mam- 
malian lactase-phlorizin hydrolase (LPH) (EC 3.2.1.108 / EC 32.1.62 ). LPH. an integral membrane glycoprotein, is 
the enzyme that spirts lactose in the small intestine. LPH is a large protein of about 1900 residues which contains four 
tandem repeats of a domain of about 450 residues which is evolutionary related to the above glycosyl hydrolases. One 
of the consented regions in these enzymes is centered on a consented glutamic acid residue which has been shown 
[5], in the beta-glucosidase from Agrobacterium. to be directly involved in glycosidic bond cleavage by acting as a 
nucleophile. This region was used as a signature pattern. As a second signature pattern a consen/ed region was 
selected, found in the N-terminal extremity of these enzymes, this region also contains a glutamic acid residue. 
[0797] Consensus pattern: [LIVMFSTCJ-IU VFYSJ-[UV]-ILIVMST]-E-N-G-{UVMFAR1-[CSAGN] [E is the active site 
residue] 

Note: this pattern wilt pick up the last two domains of LPH; the first two domains, which are removed from the LPH 
precursor by proteolytic processing, have lost the active site glutamate and may therefore be inactive [4]. 
[0798] Consensus pattern: F-x-[FYWMHGSTA]-x-[GSTA]-x-[GSTAl(2)-[FYNHJ-lNQ]-x-E-x-[GSTA]- 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Henrissat B. Protein Seq. Data Anal. 4:61-62(1991). 

[ 3] Gbnzalez-Candelas L, RanK>n D., Potaina J. Gene 95:31-38(1990). 

1 4] El Hassouni M., Henrissat B., Chippaux M.. Barras F. J. Bacteriol. 174:765-777(1992). 

[51 WHhers S.G., Vterren R.A.J., Street LP, Rupitz K., Kempton J.B., Aebersold R. J. Am. Chem. Soc. 112 

5887-5689(1990). 

[0799] 266. Glycosyl hydrolases family 2 signatures 

It has been shown (1 ,2,E1] that the following glycosyl hydrolases can be, on the basis of sequence similarities, classified 
into a single family: - Beta-galactosidases (EC 3.2.1.23) from bacteria such as Escherichia coll (genes lacZ and ebgA), 
Clostridium acetobutylicum. Clostridium themnosulfurogenes. Klebsiella pneumoniae, Lactobacillus delbrueckii, or 
Streptococcus thermophilus and from the fungi Kluyveromyces lactis. - Beta-glucuronldase (EC 3.2.1.31 ) from Es- 
cherichia coli (gene uidA) and from mammals. One of the consen/ed regions in these enzymes is centered on a con- 
served glutamic acid residue which has been shown [3], in Escherichia coli lacZ, to be the general acid/base catalyst 
in the active site of the enzyme. This region was used as a signature pattern. As a second signature pattern a highly 
consen/ed region was selected located some sixty residues upstream from the active site glutamate. 

[0800] Consensus pattem: N-x-[LIVMFYWD]-R-ISTACN](2)-H-Y-P-x(4)-[UVMFYWS)(2)-x(3)- [DN]-x(2)-G-ILIVM. 
FYW](4)- 

Consensus pattem: IDENQLFHKRVW]-N-[HRY]-(STAPV]-(SACJ-{LIVMFS](3)-W-{GS]- x(2,3)-N-E [E is the active site 
residue]- 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Schroeder C.J., Robert C, Lenzen G., McKay L.L., Mercenier A. J. Gen. Microbiol. 137:369-380(1991). 
1 3] Gebler J.C.. Aebersold R.. Withers S.G. J. Biol. Chem. 267:11126-11130(1992). 

[0801] 267. Glycosyl hydrolases family 3 active site 

It has been shown [1,2] that the following glycosyl hydrolases can be, on the basis of sequence similarities, classified 
into a single family: 

- Beta glucosidases (EC 3.2.1 .21) from the fungi Aspergillus wentii (A-3), Hansenula anomala, Kluyveromyces fra- 
gilis. Saccharomycopsis fibuligera, (BGL1 and BGL2), Schizophyllum commune and Trichoderma reesei (BGL1). 
Beta glucosidases from the bacteria Agrobacterium tumefaciens (Cbgl ), Butyrivibrio fibrisolvens (bgIA). Clostrid- 
ium thermocellum (bgIB), Escherichia coli (bglX). Enwinia chrysanthemi (bgxA) and Ruminococcus albus. - AI- 
teromonas strain 0-7 beta-hexosaminidase A (EC 3.2.1.52). 

Bacillus subtiiis hypothetical protein yzbA. 

- Escherichica coll hypothetical protein ycfO and HI0959, the corresponding Haemophilus influenzae protein. 



EP 1 033 405 A2 



Pedwaydon J., Czelusniak J.. Suzuki Gotoh T, Moens L, Shishikura R, Walz D.. Vinogradov S J Mol Evol 27- 
236-249(1988). - ^ cvoi. 

[0787] Plant henx?globins signature (giobln2) 

Leghemoglobins [1 ] are hemoprotelns present in the rootrKxiulesof leguminousplants. Leghemogtobinsare strucrurally 
s and functKxially related to hemogtobtn and myogtobin. By providing oxygen to the bacterokte. they are essential for 
symbiotic nitrogen fixatfon. Structurally related hemogtobins from the nodules of non-leguminous plants [2,3], and from 
the roots of non-nodulating plants[4] have been recently sequenced. A signature panem was devetoped that picks up 
the sequence of plants hemoglobins, exclusively 
[0788] Consensus pattern: lSN]-P-x-L-x(2)-H-A-x(3)-F- 

10 

1 1) Powell R. Gannon F. BioEssays 9:117-121(1988). 

[ 2] Kortt A. A.. Trinick M.J.. Appleby G.A. Eur. J. Biochem. 175:141-149(1988). 

[ 3] Kortt A. A., Inglis A.S.. Fleming A.I.. Appleby C.A. FEES Lett. 231:341-346(1988). 

1 4] Bogusz a, Appleby C.A., Landsmann J.. Dennis E.S.. Trinick M. J., Peacock W.J. Nature 331 : 178-180(1 988). 
[0789] 262. Fructose-bisphosphate aktolase class-l active site (glycolylic_enz) 

[0790J Fmctose-bisphosphate akiolase [1 ,2] is a glycolytic enzyme that catalyzes the reversible aldol cleavage or 
condensation of f ructose-1 ,6-bisphosphate into dihydroxyacetone-phosphate and glyceraldehyde 3i>hosphate There 
are two classes of fructose-bisphosphate aldolases with different catalytic mechanisms. Class-l aldolases [3] mawily 
20 found in higher eukaryotes, are homotetramerc enzymes which form a Schiff-base intermediate between the C-2 
carbonyl group of the substrate (dihydroxyacetone phosphate)and the epsikxi-amino group of a lysine reskJue In 
vertebrates, three fomis of this enzyme are found: aWolase A in muscle, akiolase B in liver and aldolase C in brain 
The sequence around the lysine involved in the Schiff-base is highly consented and can be used as a signature for 
this class of enzyme. 

25 [0791J Consensus pattern: (U VMl-x-[LIVMFYW]-E-G-x-ILShL.K-P-[SN] {K is involved in Schiff-base formatton]- 

[ 1) Perham R.N. Biochem. Soc. Trans. 18:185-187(1990). 

1 2J Marsh J.J., Lebherz H.G. Trends Bbchem. Sci. 17:110-113(1992). 

[ 3] Freennont RS.. Dunbar B.. Fothergill-Gilnx)re LA. Biochem. J. 249:779-788(1988). ^ 

[0792] 263. Glycosyl hydrolases family 11 active sites signatures 

The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases (EC 
MJ^. cellobiohydrolases (EC 32J^) (exoglucanases). or xylanases (EC 3.2.1.8) (1.2]. Fungi and bacteria pro- 
duces a spectmm of cellulolytic enzymes (cellulases) and xylanases which, on the basis of sequence similarities can 

35 be classified into families. One of these families is known as the celtulase family G (3) or as the glycosyl hydrolases 
family 1 1 14.E11. The enzymes which are currently known to belong to this fami V are listed below - Aspergillus a wamori 
xylanase C (xynC). - Bacillus circulans, pumilus. stearothermophilus and subtills xylanase (xynA) - Clostridium ace- 
tobutyhcum xylanase (xynB). - Ctostridium stercorarium xylanase A (xynA). - Fibrobacler succinogenes xylanase C 
xynC) which consist of two catalytic domains that both belong to family 10. - Neocallimastix patriciarum xylanase A 

^ (xynA). - Ruminococcus flavefaciens bif unctbnal xylanase XYLA (xynA). This protein consists of three domains- a N- 
temriinal xylanase catalytic domain that belongs to family 11 of glycosyl hydrolases; a central domain composed of 
Short repeats of Gin, Asn an Trp. and a C-temnlnal xylanase catalytte domain that belongs to family 10 of glycosyl 
hydrolases. - Schizophyllum commune xylanase A - Streptomyces lividans xylanases B (xInB) and C (xInC) - Tri- 
chodemna reesei xylanases I and 11. Two of the conserved regions in these enzymes are centered on glutamic acki- 

45 residues which have both been shown [5], in Bacillus pumilis xylanase. to be necessary for catalytk: activity. Both 
regions were used as signature patterns. 

[0793] Consensus pattern: [PSAHLQJ-x-E-Y-Y-fU VM](2)-(DE].x-[FYWHN] (E is an active site residue^ 
Consensus pattern: lLIVMF]-x(2)^E-|AGHYWGHQRFGSHSGHSTANI-G.x-[SAF) [E is an active site reskiuej- 

^o 1 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990). 

[ 2J Gilkes N R.. Henrlssat B.. Kilbum D.G.. Miller R.C. Jr.. Warren R.A.J. Microbiol Rev. 55:303-315(1991) 
[ 3J Hennssat B.. Claeyssens M.. Tomme R. Lemesle L.. Mornon J.-R Gene 81:83-95(1989) 
[ 4] Hennssat B. Biochem. J, 280:309-316(1991). 

^ I?7*!2in992r*^"*^ " ' Mof»yama H.. Shinmyo A.. Hata Y. Katsube Y. Urabe I.. Okada H. Biochem. J. 288: 

[0794] 264. Glycosyl hydrolase family 14 
[0795] This family are beta amylases. 
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1 2] Gilkes N.R.. Henrissat B., Kribum D.G., Miller R.C. Jr, Warren RA.J. Microbiol. Rev. 55:303-315(1991). 
[ 3] Henrissat B., Claeyssens M.. Tomme P.. Lemesle L, Momon J.-R Gene 81:83-95(1989). 
[ 4) Henrissat B. Biochem. J. 280:309-316(1991). 

[5] Tomme P.. Chauvaux S., Beguin P., Millet J., Aubert Claeyssens M. J. Biol. Chem 266 10313-10318 
(1991). 

( 6] Tomme P., van Beeumen J., Claeyssens M. Biochem. J. 285:319-324(1992). 
[0779] 258. Matrix protein (MA)» pi 5 (GAG.ma) 

[0780] The matrix protein, pi 5. is encoded by the gag gene. MA is involved in pathogenicity [1]. 
[0781] [1) : Pozsgay JM, Beilharz MW. Wines BD, Hess AD, PItha PM, J Vtrol 1993;67:5989-5999. 
[0782] 259. Gag polyprotein, inner coat protein pi 2 (G AG_P1 2) 

[0783] The retroviral p1 2 Is a virion structural protein, pi 2 is proline rich. The function carried out by p12 in assembly 

and replication is unlmown. pi 20 is associated with pathogenicity of the virus 

[1] Pozsgay JM, Beilharz MW. Wines BD, Hess AD, Pitha PM. J Virol 1993;67:5989-5999. 

[0784] 260. Glutamine synthetase signatures (GLN-SYNT) 

Glutamine synthetase (EC 6.3.1.21 (GS) [1] plays an essential role in the metabolism of nitrogen by catalyzing the 
condensation of glutamate ard ammonia to form glutamine. There seem to be three different classes of GS [2,3.4]: - 
Class I enzymes (GSI) are specific to prokaryotes, and are oligomers of 12 identical subunits. The activity of GSI4ype 
enzyme is controlled by the adenylation of a tyrosine residue. The adenylated enzyme is inactive. - Class II enzymes 
(GSII) are found in eukaryotes and In bacteria belonging to the Rhizobiaceae. Frankiaceae, and Streptomycetaceae 
families (these bacteria have also a class-! GS). GSII are octamer of Identical subunits. Plants have two or more 
isozymes of GSII. one of the isozymes is translocated into the chloroplast - Class III enzymes (GSIII) has, currently, 
only been found in Bacteroldes fragilis and in butyrivibrio fibrisoh/ens. It is a hexamer of identical chains. It is much 
larger (about 700 amino acids) than the GSI (450 to 470 amino acids) or GSII (350 to 420 amino acids) enzymes. 
While the three classes of GS's are clearly structurally related, the sequence similarities are not so extensive. As 
signature patterns three consenred regions were selected. The first pattern is based on a conserved tetrapeptide in 
the N-termlnal sectfon of the enzyme, the second one is based on a glycine-rich region which is thought to be involved 
in ATP-binding. The third pattern is specific to class I glutamine synthetases and includes the tyrosine residue whch 
is reversibly adenylated. 

[0785] Consensus pattern: [FYWL]-D-G-S-S-x(6.8)-[DENQSTAKJ-[SAJ-[DEJ-x(2)-[UVMFY]- 

Consensus pattern: K-P-[LIVMFYAJ-x(3,5)-(NPAT]-G-[GSTAN]-G-x-H-x(3)-S- 

Consensus pattern: K-[LI VMhx(5)-[U VMA]-D-[RK]-[DN]-[LI1-Y [Y is the site of adenylation]- 

[1] Eisenberg D.. Almassy R.J.. Janson C.A.. Chapman M.S., Suh S.W., Cascio D., Smith W.W. CokJ Spring 
Harbor Symp. Quant. Biol. 52:483-490(1987). 

[ 2) Kumada Y., Benson D.R., Hillemann D.. Hosted TJ.. Rochefort D.A., Thompson C.J.. Wohlleben W, Tateno 

Y. Proc. Natl. Acad. Sci. U.S.A. 90:3009-3013(1993). 

[ 3) Shatters R.G., Kahn M.L J. I^L Evol. 29:422-428(1989). 

[ 4] Brown J.R, Masuchi Y. Robb F.T, Doolittle W.F. J. Mol. Evol. 38:566-576(1994). 

[0786] 261 . Globlns profile (globin! ) 

Gtobins are heme-containing proteins involved in binding and/or transporting oxygen [1]. They belong to a very large 
and well studied family which is widely distributed in many organisms. The major groups of globins are: - Hemoglobins 
(Hb) from vertebrates. Hb is the protein responsible for transporting oxygen from the lungs to other tissues. It is a 
tetramer of two alpha and two beta chains. Most vertebrate species also express specific embryonic or fetal forms of 
hemoglobin where the alpha or the beta chains are replaced by a chain with higher oxygen affinity, as for the gamma, 
delta, epsilon and zeta chains in mammals, for example. - Myoglobins (Mg) from vertebrates. Mg is a monomeric 
protein responsible for oxygen storage in muscles. - Invertebrate globins [2). A wide variety of globins are found in 
invertebrates. Molluscs generally have one or two muscle globins which are either monomeric or dimeric. Insects, such 
as the midge Chironomus thummi, have a targe set of extracellular globins. Nematodes and annelids have a variety 
of intracellular and extracellular globins; some of them are multi- domain polypeptides (from two up to nine-domain 
globins) and some produce large, disulfide-bonded aggregates. - Leghemogtobins (Lg) from the root nodules of legu- 
minous plants. Lg provides oxygen for bacteroids. - Flavohemoproteins from bacteria (Escherichia coli hmpA) and 
fungi [3]. These proteins consist of two distinct domains: an N-terminal globin domain and a C-terminal FAD-contalning 
reductase domain. In bacteria such as Vitreoscllla. the enzyme-associated gtobin is a single domain protein. All these 
globins seem to have evolved from a common ancestor. The profile developed to detect members of the globin family 
is based on a structural alignment of selected globin sequences 

1 1J Concise Encyclopedia Biochemistry. Second Edition, Walter de Gruyter. Berlin New- York (1988).( 2] Goodman M., 
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acting as a nucleophflo. This region was used as a signature pattern. As a second signature pattern we selected a 
conserved region, found In the N-terminal extremity of these enzymes, this region also contains a glutamic acid residue 
[0769] Consensus pattem[LI\^FSTC]^LI\^YSHUVl-n-IVMST].E4^-G4UVMFARHCSAGI^ |E is the active site 
residue] Sequences known to belong to this class detected by the pattemALL 

[0770] Note: this pattern will pick up the last two domains of LPH; the first two domains, which are removed from the 
LPH precursor by proteolytk: processing, have lost the active site glutamate and may therefore be inactive [4] 

[0771] Consensus pattemF-x-[FYWM]-[GSTAl-x-[GSTAJ-x-(GSTAl(2HFYNH]-[NQJ-x-E-x4GSTAl Sequences 

known to betong to this class detected by the pattem ALL 

[0772] Note: this pattem will pick up the last three domavis of LPH. 



[ 1} Henrissat B. Bkx:hem. J. 280:309-316(1991). 

[ 2) Henrissat B. Protein Seq. Data Anal 4:61-62(1991). 

[ 3] Gonzalez-Candelas L, Ramon D., Polaina J. Gene 95:31-38(1990). 

[ 4) El Hassouni M., Henrissat B.. Chippaux M.. Barras F. J. Bacteriol. 174:765-777(1992). 

1 5] Withers S.G.. Warren R.A.J.. Street I.P, Rupitz K.. Kempton J.B.. Aebersold R J. Am. Chera Soc 112" 

5887-5889(1990). 



[0773] 256. Glyco_hydro_20 

Glycosyl hydrolase family 20 

Previous Ram IDs: glycosyLhydrl 1 ; 

Number of memt)ers: 33 ^ 

[0774] 257. (Glyco_hydro_9) 

Glycosyl hydrolases family 9 active sites signatures 

(aka GlycosyLhydrl 2) 

[0775] The microbial degradatk>n of cellutose and xylans requires several types of enzymes such as endoglucanases 
(EC 32.1.4). celtobiohydrolases (EC 3.2.1.91) (exoglucanases). or xylanases (EC 3.2.1.8) [1.2). Fungi and bacteria 
produces a spectrum of cellublytic enzymes (cellulases) and xylanases whbh. on the basis of sequence similarilies. 
can be classified into families. One of these families is known as the cellulase family E [3] or as the glycosyl hydrolases 
family 9 14.E1]. The enzymes whfch are currently known to belong to this family are listed below. 

Butyrivibrk>fibrisolvenscellodextrinase 1 (cedl). 
Cellulomonas fimi endoglucanases B (cenB) and C (cenC). 
Clostridium cellulolyticum endoglucanase G (celCCG). 
Clostridium cellubvorans endoglucanase C (engC). 
Clostridium stercoararium endoglucanase Z (avrcelase I) (celZ). 

- Clostridium thermocellum endoglucanases D (celD), F (celF) and I (cell ). 
FIbrobacter succinogenes endoglucanase A (endA). 
Pseudomonas fluorescens endoglucanase A (celA). 

Streptomyces reticuli endoglucanase 1 (cell). 
Thermomonospora fusca endoglucanase E-4 (celD). 

* Dictyostelium discoideum spore germinaton specific endoglucanase 270-6. This slime mold enzyme may digest 
the spore cell wall during germination, to release the enclosed amoeba. 

- Endoglucanases from plants such as Avocado or French bean. In plants this enzyme may be involved the fruit 
ripening process. 

[0776] Two of the most consented regk>ns in these enzymes are centered on consented rescues which have been 
shown [5.6], in the endoglucanase D from Cellutomonas thermocellum. to be important for the catalytic activity. The 
first regbn contains an active site histidine and the second region contains two catalytically important residues: an 
aspartate and a glutamate. Both regions were used as signature patterns. 

[0777] Consensus pattem [STV]-x-{LIVMFY]-[STV).x(2)-G-x-|NKR]-x(4)-[PLI VMJ-H-x-R [H is an active site reskiue] 
Sequences known to bebng to this class detected by the pattem ALL. except for Cellutomonas fimi cenC and Strep- 
tomyces reticuli cell. 

[0778] Consensus pattem [FYWl-x.D-x(4)-[FYWl-x(3)-E-x-[STAJ-x(3)-N-(STA] [D and E are active site reskiues] Se- 
quences known to belong to this class detected by the panem ALL, except for Fibrobacter succinogenes endA whose 
sequence seems to be incorrect. 



1 1J Beguin P. Annu. Rev. Microbtol. 44:219-248(1990). 
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[0759] Prokaryotic, eukaryotic PG and exoPG share a few regions of sequence similarity. The best conserved of 
these regions was selected. It is centered on a consented histidine most probably involved in the catalytic mechanism 

(4]. 

[0760] Consensus pattem[GSDENKRH]-x(2)-[VMFC]-x(2)-(GS]-H-G-[LIVMAGl-x(1 .2)-[LIVM]-G-S [H is the putative 
active site residue] Sequences known to belong to this class detected by the pattemALL 
[0761] Note: these proteins belong to family 28 in the classification of glycosyt hydrolases [5]. 

( 1] Ruttowski E., Labitzke R, Khanh N.Q.. Loeffler R. Gottschalk M., Jany K.-D. Blochhi. Biophys. Acta 10B7: 
104-106(1990). 

( 2) Huang J., Schell M.A. J. BacterioL 172:3879-3887(1990). 

I 3] He S.Y., Collmer A. J. BacterioL 172:4988-4995(1990). 

[ 4J Bussink HJ.D., Buxton F.P., Visser J. Curr. Genet 19:467-474(1991). 

[ 5] Henrissat B. Biochem. J. 280:309-316(1991). 

[0762] 254. {Glyco_hydro_32) 
Glycosyl hydrolases family 32 active site 

[0763] It has been shown [1,2] that the folbwing glycosyl hydrolases can be classified into a single family on the 
basis of sequence similarities: 

Inulinase (EC 3.2.1.7) (or inutase) from the fungi Ktuyveromyces mancianus. 

Beta-f ructof uranosidase (EC 3.2. 1 .26), commonly known as invertase in fungi and plants and as sucrase in bacteria 
(gene sacA or scrB). 

- Raffmose invertase (EC 3.2.1.26) (gene rafD) from Escherichia coli plasmid pRSD2. 

- Levanase (EC 3.2.1 .65) (gene sacC) from Bacillus subtilis. 

[0764] One of the consented regions in these enzymes is located in the N-terminal section and contains an aspartk: 
acid residue which has been shown [3], in yeast invertase to be important for the catalytic mechanism. This region was 
used as a signature pattern. 

[0765] Consensus pattern H-x(2)-P-x(4)-[UVM]-N-D-P-N-G [D is the active site residue] Sequences known to betong 
to this class detected by the pattemALL 

[ 1] Henrissat B. Bkx:hem. J. 280:309-316(1991). 

[ 2] Gunasekaran R, Karunakaran T., Cami B.. Mukundan A.G., Preziosi L. Baratti J. J. Bacterfol. 172:6727-6735 
(1990). 

[ 3) Reddy V.A.. Maley F. J. Biol. Chem. 265:10817-10120(1990). 

[0766] 255. (Glyco_hydro_1) 
Glycosyl hydrolases family 1 signatures 

[0767] It has been shown [1 to 4] that the foltowing glycosyl hydrolases can be. on the basis of sequence similarities, 
classified into a single family: 

Beta-glucosidases (EC 3.2. 1 .21 ) from varbus bacteria such as Agrobacterium strain ATCC 21 400, Bacillus poly- 
myxa, and Caldocellum saccharolytbum. 

- Two plants (clover) beta-glucosidases (EC 3.2.1 .21). 

- Two different beta-galactosidases (EC 3.2. 1 .23) from the archaebacteria Suit olobus solfataricus (genes bgaS and 
lacS). 

- 6-phospho-beta-galactosidases (EC 3.2.1 .85) from various bacteria such as Lactobacillus casei. Lactococcus lac- 
tis» and Staphylococcus aureus. 

- 6-phospho-beta-glucosidases (EC 3.2.1.86) from Escherichia coli (genes bgIB and ascB) and from Enwinia chry- 
santhemi (gene arbB). 

Plants myrosinases (EC 3.2.3.1) (sinigrinase) (thioglucosidase). 

- Mammalian lactase-phlorizin hydrolase (LPH) (EC 3.2. 1 . 1 08 / EC 3.2. 1 .62). LPH, an integral membrane glycopro- 
tein, is the enzyme that splits lactose in the small intestine. LPH is a large protein of about 1900 residues whk:h 
contains four tandem repeats of a domain of about 450 residues which Is evolutionary related to the above glycosyl 
hydrolases. 

[0768] One of the conserved regions in these enzymes is centered on a consen/ed glutamic acid residue which has 
been shown [5], in the beta-glucosidase from Agrobacterium, to be directly involved in glycosidk: bond cleavage by 
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[0748] 251 . (Glyco_hydro_17) 
Glycosyl hydrolases family 17 signature 
(aka glycosyLhydro4) 

[0749] It has been shown [1.2] that the following glycosyl hydrolases can be classified into a single family on the 
basis of sequence similarities: 

- Glucan endo-1 ,3-beta-glucosidases (EC 3.2.1 .39) (endo-(1 •>3)-beta-glucanase) from various plants. This enzyme 
may be involved In the defense of plants against pathogens through its ability to degrade fungal cell wall polysac- 
charides. 

- Glucan 1.3-beta-glucosidase (EC 3.2.1.58) (exo-(1->3)-beta-glucanase) from yeast (gene BGL2). This enzyme 
may play a role in cell expansion during growth, in cell-cell fusion during mating, and in spore release during 
sporutation. 

- Lichenases (EC 3.2.1.73) (endo-(1->3.1->4)-beta-gIucanase) from various plants. 

[0750] The best conserved region in the sequence of these enzymes is located in their central section. This region 
contains a consented tryptophan residue which could be involved in the interaction with the glucan substrates [2J and 
it also contains a conserved glutamate which has been shown [3] to act as the nucleophile in the catalytic mechanism. 
This region was used as a signature pattern. 

[0751] Consensus pattern [LIVMl-x-[UVMFYWA](3).[STAG]-E-{STA].G.W-P4STNl-x-[SAGQl |E is an active site res- 
idue] Sequences known to betong to this class detected by the pattem ALL 

[ 1] Henrissat B. Blochem. J. 280:309-316(1 991). 

[ 2] On N.. Sessa G.. Lotan T. Himmelhoch S.. Fluhr Ft EMBO J. 9:3429-3436(1990). 

[ 3] \ferghese J.N., Garrett TP.J., Colman PM.. Chen L. Ho| PJ.. Fincher G.B. Proc. Natl. Acad Sci U S A. 91 
2785-2789(1994). 

[0752] 252. (Glyco_hydro_3) 
Glycosyl hydrolases family 3 active site 

[0753] It has been shown [1 ,2] that the following glycosyl hydrolases can be, on the basis of sequence similarities, 
classified into a single family: 



- Beta glucosidases (EC 3.2.1 .21) from the fungi Aspergillus wentii (A-3). Hansenula anomala. Kluyveromyces f ra- 
gilis. Saccharomycopsis fibuligera, (BGL1 and BGL2). Schizophyllum commune and Trichoderma reesei (BGL1). 

- Beta glucosidases from the bacteria Agrobacterium tumefaciens (Cbgl ). Butyrivibrio fibrisolvens (bgIA), ClostrkJ- 
ium themiocellum (bgIB), Escherichia coli (bglX), Envinia chrysanthemi (bgxA) and Ruminococcus albus. 

- Alteromonas strain 0-7 beta-hexosaminidase A (EC 3.2. 1 .52). 

- Bacillus subtilis hypothetical protein yzbA. 

- Escherichica coli hypothetical protein ycfO and HI0959. the corresponding Haemophilus influenzae protein. 

[0754] One of the consen/ed regions In these enzymes is centered on a consented aspartic acid residue whk:h has 
been shown [3], in Aspergillus wentii beta- glucosidase A3, to be implicated in the catalytic mechanism. This region 
was used as a signature pattem. 

[0755] Consensus patterr^UVI^(2)-[KF^-x-IEQiq-x(4)-G-[LIVMFT|-fLIVT].[LIVMF]- (ST]-D-x(2)-[SGADNIJ [D is the 
active site residue] Sequences known to belong to this class detected by the pattemALL. 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

( 2) Castle LA.. Smith K.D.. Morris RO. J. Bacteriol. 174:1478-1486(1992). 
[ 3] Bause E., Legler G. Biochim. Biophys. Acta 626:459-465(1980). 

[0756] 253. (Glyco_hydro_28) 
Polygalacturonase active site (aka PG) 

[0757] Polygalacturonase (EC 3.2.1 .15) (PG) (pectinase) (1.2] catalyzes the random hydrolysis of 1.4-alpha-D-ga- 
lactosiduronic linkages in pectate and other galactuionans. In fruit, polygalacturonase plays an important role in celt 
wall metabolism during ripening. In plant bacterial pathogens such as Enwinia carotovora or Pseudomonas 
solanacearum and fungal pathogens such as Aspergillus niger. polygalacturonase is involved in maceration and soft- 
rotting of plant tissue. 

[0758] Exo-poly-alpha-Ogalacturonosidase (EC 3.2. 1 .82) (exoPG) [3] hydrolyzes peptic acid from the non-reducing 
end, releasing digalacturor^ate. 
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collectively termed MAGUKs (membrane-associated guanylate kinase homologs) (5}: - Drosophlla lethal(1)dlscs large- 
1 tumor suppressor protein (gene digl ). This protein is associated with septate junctions in developing flies and defects 
in the digl gene cause neoplastic overgrowth of the imaginal disks. - Mammalian tight junction protein Zo-1 . - A family 
of mammalian synaptic prote^is that seem to interact with the cytoplasmic tail of NMDA receptor subun'rts. This family 
currently consist of SAP90/PSD-95, CHAPSYN-110/PSD-93. SAP97/DLG1 and SAP102. - Vertebrate 55 Kd erythro- 
cyte membrane protein (p55). p55 is a palmitoylated, membrane-associated protein of unknown f unctbn. - Caenorhab- 
ditis elegans protein lin-2. whk;h may play a structural role in the inductfon of the vulva. - Rat protein CASK. - Human 
protein DLG2. - Human protein DL63.There is an ATP-binding site (P-toop) in the N-tenninal section of GK. This region 
is not consented in the GK-like domain of the above proteins which are therefore unlikely to be kinases. However these 
proteins retain the residues known, in GK, to be involved in the binding of GMP As a signature pattern a highly consented 
region was selected that contains two arginine and a tyrosine which are involved In GMP-binding 
[0738] Consensus pattern: T-lST]-R-x(2)-[KRJ-x(2HDEl-x(2)-G-x(2)-Y-x-[FYHUVMKl- 

[ 1) Stehle T. Schuiz G.E. J. MoL Btol. 224:1127-1141(1992). 

( 2] Bryant P.J., Woods D.F. Cell 68:621-622(1992). 
I 3] Goebl M.G. Trends Bkx:hem. Sci. 17:99-99(1992). 

[ 4] Zschocke PD., Schiltz E., Schuiz G.E. Eur. J. Blochem. 213:263-269(1993). 
( 5J Woods D.F., Bryant PJ. Mech. Dev. 44:85-89(1994). 

[0739] 249. (Glyco_hydro_35) 

Glycosyl hydrolases family 35 putative active site 

[0740] Beta-galactosidases (EC 3.2.1 .23) from mammals, fungi, plants and the bacteria Xanthomonas manihotis are 
evolutionary related [1,2]. They belong to family 35 in the classification of glycosyl hydrolases [3,E1]. 
[0741] Mammalian beta-galactosidase is a lysosomal enzyme (gene GLB1) whfch cleaves the terminal galactose 
from ganglk>sides. glycoproteins, and glycosaminoglycans and whose deficiency is the cause of the genetic disease 
Gm(1) ganglioskiosts (Morquio disease type B). 

[0742] On of the best conserved regons In these enzymes contains a glutamic acid residue whfch, on the basis of 
similarities with other families of glycosyl hydrolases [4], probably ads as the proton donor in the catalytic mechanism. 
This region wss used as a signature pattern. 

[0743] Consensus pattern: G-G-P-lUVMJ(2)-x(2)-Q-x-E-N-E-IFY} [The second E is the putative active site reskJueJ 
Sequences known to belong to this class detected by the pattern ALL 

[ 1] Taron C.H., Benner J.S., Hornstra L.J., Guthrie EP. Glycoblology 5:603-610(1995). 

[ 2) Carey A.T., Holt K., PIcard S.. Wilde R.. Tucker G.A.. Bird C.R.. Schuch W.. Seymour G.B. Plant Physk>l 108' 
1099-1107(1995). 

[ 3] Henrissat B., Bairoch A. Biochem. J. 293:781-788(1993). 

[ 4] Henrissat B.. Callebaut I.. Fabrega S., Lehn R, Momon J.-P.. Davies G. Proc. Natl. Acad Sci USA 92* 
7090-7094(1995). 

[0744] 250. (Glyco_hydro_16) 
Glycosyl hydrolases family 16 signature 

[0745] It has been shown [1] that the foltowing glycosyl hydrolases can be classified into a single family on the basis 
of sequence similarities: 

- Bacterial beta-1,3-1,4-glucanases. or lichenases, (EC 3.2.1.73) mainly from Bacillus but also from Ctostridium 
thermocellum (gene llcB), Fibrobacter succinogenes and Rhodolhermus marinus (gene bgIA). 

- Bacillus circulans beta-1 .3-glucanase A1 (EC 3.2.1 .39) (gene gteA). 
Lamarinase (EC 3.2. 1 .6) from CtostrkJium themriocellum (gene lami ). 
Streptomyces coelicokjr agarase (EC 3.2.1 .81 ) (gene dagA). 
Aiteromonas carrageenovora kappa-carrageenase (EC 3.2. 1 .83) (gene cgkA). 

[0746] Two closely clustered consen/ed glutamates have been shown [2] to be involved in the catalytic activity of 

Bacillus llcheniformis lichenase. The region that contains these residues as a signature pattern was used. 

[0747] Consensus pattern E-(UV]-D-(U V]-x(0.1 )-E-x(2)-[GQl-[KRNF]-x-[PSTA] [The two E's are active site residues] 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

( 2) Juncosa M., Pons J., Dot T, Querol E., Planas A. J. Bbl. Chem. 269:14530-14535(1994). 
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[0730] The proteins known to belong to this family are: 



10 



IS 



- Glypican 1 (GPC1). 

- Glypican 2 (GPC2) or cerebroglycan. 

- Glypican 3 (GPC3) or OGI-5. In man. defects in GPC3 are the cause of a X-lrnked genetb disease. Ssnpson- 
Galabt-Behmel syndrome (SGBS). * 
K-glypbaa 

- Glypican 5 (GPC5). 
Drosophila protein dally. 

[07311 The signature pattern that was devetoped for glypfcans is located in the central sectk)n of the extracellular 
domain and contains five of the conserved cysteines. 

[07321 Consensus pattemC-x(2)-G-x-G4U VM].x(4)-P-C-x(2HFY] C-x(2)-ILIVM]-x(2)- G-C [The Cs are prot>abV in- 
voh^ed m a disulfide bonds] Sequences known to belong to this class detected by the pattern ALL. except for dally 

[ 1] Weksberg R, Squire JA. Templeton D.M. NaL Genet. 12:225-227(1 g96). 
i 2J Watanabe K.. Yamada H.. Yamaguchi Y J. Cell Bbl. 130:1207-1218(1995). 

[0733] 246. Granins signatures 

Granins (chromogranins or secretogranins) [IJ are a family of acidic proteins present in the secretory granules of a 
wide variety of endocrine and neuro>endocrine cells. The exact f unction(s) of these proteins is not yet known but they 
seem to be the precursors of biotogically active peptbes andfor they may ad as helper proteins in the packaging of 
peptide honnones and neuropeptides. Three members of this family of proteins show some sequence similarities - 
Chromogranin A (CGA) [2J. CGA is a protein of about 420 residues; it is the precursor of the peptide pancreastatin 
which strongly inhibits glucose-induced insulin release from the pancreas. - Secretogranin 1 (chromogranin B) A sul- 
fated protein of about 600 residues. - Secretogranin 2 (chromogranin C). A sulfated protein of about 650 reskiues 
Apart from their subcellular location and the abundance of acidic residues(Asp and Glu). these proteins do not share 
many structural similarities. Only one short region, located in the C-tenminal section, is conserved in all these proteins 
Chromogranins A and B share a region of high similarity in their N-temninal sectton; this regton Includes two cysteine 
30 residues involved in a disulfkie bond 

[07341 Consensus pattern: [DE]-{SNJ-L-[SAN]-x(2)-[DE)-x-E'L- 

Consensus pattern: C-fU VM](2)-E.ILI VM](2)-S-[DN]-lSTA]-L.x-K-x-S-x(3)- [LI VMHSTAJ-x-E-C [The two C's are linked 
by a disulfide bond]- 



20 



25 



35 



[ 1) Huttner W.B., Gerdes H.-H.. Rosa R Trends Bkx:hem. ScL 16:27-30(1991). 
[ 2) Simon J.-P.. Aunis D. Bkx;hem. J. 262:1-13(1989). 



[0735] 247. grpE protein signature 

In prokaryotes the grpE protein (1 J stimulates, jointly with dnaJ. the ATPase activity of the dnaK chaperone It seems 
^0 to accelerate the release of ADP from dnaK thus allowing dnaK to recycle more efficiently. GrpE is a protein of about 
22 to 25 Kd. In yeast, an evolutfonary related mitochondrial protein(gene GRPE) has been shown (2] to associate with 
the mitochondrial hsp70protein and to thus play a role in the import of proteins from the cytoplasm. As a signature 
pattern, the most consented region of grpE was selected. It is located In the C-iermlnal section 
^ IR??[SA^TrMIN^ '^*-'"^^^'^^^^^^*''^^)"l^^l-^-A"l*-IVMTN].x(16,20)-G-lFYh x(3).[DEG]-x(2).|LIVMh 

[ 1] GeorgopoukJs C, Weteh W. Annu. Rev Cell Biol. 9:601-635(1993). 

f 2J Bolliger L. Deloche O.. Glick B.S.. Georgopoulos C. Jenoe P. Kronkiou N., Horst M.. Morishima N Schatz 
G. EMBO J. 1 3: 1 998-2006(1 994). 

so 

[0737] 248. Guanylate kinase signature and profile 

Guanylate kinase (EC gJAB) (GK) [1] cata^zes the ATP-dependent phosphorylatbn of GMP into GDP It is essential 
for recycling GMP and indirectly. cGMP In prokaryotes (such as Escherichia coli). lower eukaryotes (such as yeast) 
and in vertebrates, GK is a highly consen/ed monomeric protein of about 200 amino acbs. GK has been shown [2 3 4] 
55 to be structurally similar to the foltowing proteins: - Protein A57R (or SalG2R) from varbus strains of \feccinla virus 
This protein is highly similar to GK. but contains a frameshlft mutation In the N-terminal section and could therefore be 
inactive m that virus. The foltowing proteins are characterized by the presence in their sequence of one or more copies 
of the DHR domain, a SH3 domain (see <PDOC50002> as well as a C-terminal GK-like domain, these protein are 



EP 1 033 405 A2 



[07211 [11 Medline: 95252686. Afamilyof UDP-GlcNAc/MurNAc: polyisoprenol-PGIcNAc/MurNAc-1-P transferases. 

Lehrman MA; Glycobiology 1994;4:768-771. 

[0722] 241. Gtycosyl hydrolases family 15. 21 members. 

[0723] 242. Glycosyl hydrolases family 1 6 signature 

It has been shown [1] that the following glycosyl hydrolases can be classified into a single family on the basis of 
sequence similarities: - Bacterial beta-1 ,3-1 ,4-glucanases, or lichenases. (EC 3.2.1.73) mainly from Bacillus but also 
from Clostridium thermocellum (gene licB), Fibrobacter succinogenes and Rhodothermus marinus (gene bgIA). - Ba- 
cillus circulans beta-1 .3-glucanase A1 (EC 3.2.1.39) (gene gIcA). - Lamarinase (EC 32.1.6 ) from Clostridium thermo- 
cellum (gene lami). - Streptomyces coelicolor agarase (EC 3.2.1.81) (gene dagA). - Alterorrxwias carrageenovora 
kappa-carrageenase (EC 3.2.1.83 ) (gene cgkA).Two closely clustered conserved glutanriates have been shown [2] to 
be involved in the catalytic activity of Bacillus licheniformis lichenase. The region was used that contains these residues 
as a signature pattern. 

[0724] Consensus pattern: E-[UV]-D-[UVl-x(0,1)-E-x(2)-[GQHKRNF]-x-[PSTAJ [The two E's are active site resi- 
dues]- 

[ 1) Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Juncosa M., Pons J., Dot T., Querol E., Planas A. J. BioL Chem. 269:14530-14535(1994). 
[0725] 243. Glycosyl hydrolases family 17 signature 

It has been shown [1,2] that the following glycosyl hydrolases can be classified into a single family on the basis of 
sequence similarities: - Glucan endo-1 ,3-beta-glucosidases (EC 3.2.1.39) (endo-(1->3)-beta- glucanase) from various 
plants. This enzyme may be involved in the defense of plants against pathogens through its ability to degrade fungal 
cell wall polysaccharides. - Glucan 1 ,3-beta-glucosidase (EC 3.2.1.58 ) (exo-(l->3)-beta-glucanase) from yeast (gene 
BGL2). This enzyme may play a role in cell expansion during growth, in celk:ell fusion during mating, and in spore 
release during sporulation. - Lichenases (EC 3.2.1.73) (endo-(l->3.1->4)-beta-glucanase) from various plants. The 
best conserved region in the sequence of these enzymes is located in their central section. This region contains a 
conserved tryptophan residue which could be involved in the interaction with the glucan substrates [2] and it also 
contains a conserved glutamate which has been shown [3] to act as the nucleophile in the catalytic mechanism, this 
region was used as a signature pattern. 

Consensus pattern: ILIVM]-x-[U VMFYWAl(3)-[STAGJ-E-[STA}-G-W-P-[STN]-x-[SAGQl [E is an active site residue]- 
[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

( 2) Ori N., Sessa G.. Lotan T. Himmelhoch S., Fluhr R. EMBO J. 9:3429-3436(1990). 

[ 3] Varghese J.N., Garrett TP. J., Colman P.M.. Chen L, Hoj P.J.. Fincher G.B. Proc. Natl. Acad. Sci U S A. 9V 
2785-2789(1994). 

[0726] 244. Glyoxalase I signatures 

Glyoxalase I (EC 4.4.1.5) (lactoylglutathione lyase) catalyzes the first step of the glyoxal pathway, the transformation 
of methylglyoxal and glutathioneinto S-lactoy {glutathione which is then converted by glyoxalase II to lactic acid [1]. 
Glyoxalase I is an ubiquitous enzyme which binds one mole of zinc per subunit. The bacterial and yeast enzymes are 
monomeric while the mammalian one is honrKxiimeric. The sequence of glyoxalase I is well conserved. In bacteria and 
mammals, the enzyme is a protein of about 1 30 to 1 80 residues while in fungi it is about twice longer. In these organisms 
the enzyme is built out of the tandem repeat of an homologous domain. Two signature patterns for this family were 
derived. The first one is located in the N-terminal region while the second one is located in the central section of the 
prote^ and contains a conserved histidine that could be implicated in the binding of the zinc atom. 
[0727] Consensus pattern: [HQ]-[IVT]-x-[LIVFY]-x-[IV]-x(5)-(STAl-x(2)-F-IYM)-x(2.3)-[LMFJ-G-[LMF]- 
Consensus pattern: G-(NTKQ]-x(0,5)-[GA]-|LVFY]-[GH]-H-[IVF)-[CGAJ-x-[STAGLE]-x(2)-[DNC)- 
[0728] [ 1] Kim N.-S.. Umezawa Y, Ohmura S.. Kato S. J. Biol. Chem. 268:11217-11221(1993). 
[0729] 245. (Glypican) 
Glypicans signature 

Glypicans [1,2] are a family of heparan sulfate proteoglycans which are anchored to cell membranes by a glycosyl- 
phosphatidylinositol (GPI) linkage. Structurally, these proteins consist of three separate domains: 

a) A signal sequence; 

b) An extracellular domain of about 500 residues that contains 1 2 consen/ed cysteines probably involved in disu If Ide 
bonds and which also contains the sites of attachment of the heparan sulfate glycosaminoglycan side chains; 

c) A C-terminal hydrophobic region which is post-translationally removed after formation of the GPI-anchor. 
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[07031 Consensus pattem[KRHGSATI-x(4HFYWLHHDQNGKhx-p.x4UVMFYJ-x(3V^^ Se- 

quences known to belong to this class detected by the pattern ALL 

[07041 Note: these proteins belong to family M22 in the dassification of peptidases (2.E1 J. 

1 1] Abdullah K M.. Lo R.Y.C., Meltors A. J. Bacteriol. 173:5597-5603(1991). 
( 2] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1 995). 

[07051 235. (Glucosamine_iso) 

Glucosamtne/galactosamine-e-phosphate isomerases signature 

Glucosamine-6-phosphate isomerase (EC 5.3.1. 10) (or Glc^P deaminase) is the enzyme responsible for the conver- 
sion of glucosamine 6-phosphate intof mctoseS phosphate [1). It is the last specific step in the pathway for N^cetylglu- 
cosamine (GlcNAC) utilization in bacteria such as Escherichia coli (gene nagB) or in fungi such as Candida albicans 
(gene NAG1).GIc-6-P isomerase is evolutionary related to: - A putative Escherichia coli galactosamine^hosphate 
isomerase (gene agal) J2J. - Escherichia coli hypothetical protein yieK. - Bacillus subtilis hypothetical protein ybfl As 
a signature pattern a consented region located in the central part of these enzymes was selected. This region contains 
a consented hjstidine which has been shown [1 J. in nagB, to be important for the pyranose ringopeninq step of the 
catalytic mechanism » r 

[07061 Consensus pattern: [UVM]-x(3)-G.x-[UT]-x-(UVhx-[UVM|-x-G4UVM}-G-x- [DEN]-G-H- 

[ II Oliva G., Pontes M.RM., Garratt R.C., Altamirano M.M., Calcagno M.L. Horjales E. Structure 3:1323-1332 
( 1 995). 

{ 2J Reizer J., Ramseier T.M., Reizer A.. Charbit A., Saier M.H. Jr. Micfobiotogy 142:231-250(1996). 
[0707] 236. Pneumovirus attachment glycoproteh G (glycoprotein G) 

[OTOq This family includes attachment proteins from respiratory synctial virus. Glycoprotein G has not been shown 
to hwe any neuraminidase or hemagglutinin activity (Swiss-Prot). The amino terminus is thought to be cytoplasmic 
andttje carboxyl terminus extracellular. The extracellular region contains four completely consented cysteine residues' 
[0709J [1 ] Johnson PR. Spriggs MK, Olmsted RA. Collins PL, Proc Natl Acad Sci U S A 1 987 84-5625-5629 
[0710] 237. Glycosyl transferases group 1 . • . 

[0711] Mutations in this domain of Swiss: P37287 lead to disease (Paroxysmal f^umal haemoglobinuria). Members 
of this family transfer activated sugars to a variety of substrates, including glycogen Fmctose-6i)hosphale and lipopol- 
y^cchandes. Members of this family transfer UDP. ADP. GDP or CMP linked sugars. The eukaiyotic glycogenam- 
thases n>ay be distant members of this family. a/*^ »y" 

[0712] 238. Glycosyl transferases (Glycos_transf_2) 

[0713] Diverse family, transferring sugar from UDP-glucose, UDP-N-acetyl-galactosamine. GDP-mannose or CDP- 
abequose. to a range of substrates including cellulose, dolichol phosphate and teichoic acids 
[0714] 239. (Glucos_transf_3) 

Thymidine and pyrimidgne-nudeoside phosphorylases signature 

[OTiq Thymidine phosphoorlase (EC 2.4.2.4) catalyzes the reversible phosphorolysis of thymidine, deoxyuridine 
and their analogues to their respective bases and 2-deoxyrlbose 1 -phosphate. This enzyme regulates the availability 
oi thymidine and is therefore essential to nucleic acid metabolism. 

[07iq In Escherichia coli (gene deoA), the enzyme is a dimer of identical subunits of about 48 Kd [11 In humans it 

r^SjtC^I^'r'''"*'''^*' "^"^ "^'"9 '^'^^ as 

Phosphorylase (EC 2.4.2.2) (gene pdp) [3] is an enzyme evolutionary and 
structurally related to thymidine phosphorylase. 

[0718] A a well conserved region of 19 residues located in the N-tem,inal part of these proteins signature pattern for 

these enzymes was selected. k "'wi 

mV^I PattemS^GS]-R^GAJ-^LIV]-x(2)-^A)-[6AJ-G-T-x-D-x-[LIVJ-E Sequences known to betong to this 

class detected by the pattern ALL y m no 

'ii!?!!J,oon";fjr"' ^ ^'^ • T.A., Ealick S.E. J. Biol. Chem. 265: 

1 4U I D-1 4022( 1 990). 

( 2] Furukawa T Yoshimura A.. Sumizawa T. Haraguchi M.. Akiyama S.-l.. Fukui K., Yamada Y Nature 356:668-668 
f 3J Saxild H.H.. Andersen LN., Hammer K. J. Bacteriol. 178:424-434(1996). 



[07201 240. Glycos_transf^4. Glycosyl transferase. Number of members: 44. 
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disulfide bonds]* 

[1] Bruix M.. Jimenez M.A.. Santoro J., Gonzalez C, Colilla RJ.» Mendez E.. Rico M, Biochemistiy 32715-724 
(1993). 

12] Gu Q., Kawata E.E.. Morse M.-J., Wu H.-M.. Cheung A.Y Mol. Gen. Genet 234:89-96(1992). 

[3] Terras F.R.G.» Torrekens S., van Leuven R, Osbom R.W.. Vanderteyden J.» Cammue B.PA. Broekaert WF 

FEBS Lett. 316:233-240(1993). 

(4) Bloch C. Jr.. Richardson M. FEBS Lett. 279:101-104(1991). 

[5] Ishibashi N., Yamauchi D., Miniamikawa T Plant Mol. Biol. 15:59-64(1990). 

[7] Choi Y., Choi YD., Lee J.S. Plant Physk>l. 101:699-700(1993). 

[0669] 231. Gelsolin. Gelsolin repeal Number of members: 170 

[0690] [IJMedline: 97433077. The crystal structure of plasma gelsolin: implications for actin severing, capping, and 
nucleation. Burtnick LD. Koepf EK. Grimes J, Jones EY. Stuart Dl, McLaughlin PJ. Robinson RC; Cell 1997:90:661^70. 
[0691] 232. Germin family signature 

Germlns [1 J are a family of homopentameric cereal glycoproteins expressed during germination which may play a role 
in altering the properties of cell walls during germinative growth. It has been shown that wheat and barteygemiins act 
as oxalate oxidases (EC 1.2.3.4) . an enzyme that catalyzes the oxidative degradation of oxalate to carbonate and 
hydrogen peroxide. Germins are highly similar to: - Germin-like proteins from varbus plants such as rape, violet or 
white mustard. - Slime mold spherulins la and lb whk:h are proteins that accumulate specifically during spherulatk)n, 
a process Induced by various forms of environmental stress whfch leads to encystment and dormancy. As a signature 
pattern the best consented region was selected: a decapeptkJe located in the central sectton of these proteins. 
[0692] Consensus pattern: G-x(4)-H-x-H-P-x-A-x-E-[LIVM]- 
[0693] [1] Lane B.G. FASEB J. 8:294-301(1994). 
[0694] 233. (GlutR) 
GlutamyMRNA reductase signature 

[0695] Delta-aminolevulinic ackJ (ALA) is the obligatory precursor for the synthesis of all tetrapyrroles including por- 
phyrin derivatives such as chtorophyll and heme. ALA can be synthesized via two different pathways: the Shemin (or 
C4) pathway whteh involves the single step condensation of succinyl-CoA and glycine and which is catalyzed by ALA 
synthase (EC 2.3.1.37) and via the CSpathway from the five-carbon skeleton of glutarr^te. The C5 pathway operates 
in the chloroplast of plants and algae, In cyanobacteria, in some eubacterla and in archaebacteria. 
[0696] The initial step in the C5 pathway is carried out by glutamyl-tRNA reductase (GluTR) [1] which catalyzes the 
NADP-dependent conversk)n of glutamate- tRNA(Glu) to glutamate-1-semiakJehyde (GSA) with the concomitant re- 
lease of tRNA(Glu) whk:h can then be recharged with glutamate by glutamyl-tRNA synthetase. 
[0697] GluTR is a protein of about 50 Kd (467 to 550 residues) which contains a few conserved region. The best 
consented region is located In positions 99 to 1 22 In the sequence of known GluTR. This region seems important for 
the activity of the enzyme. We have developed a signature pattern from that conserved region. 

[0698] Consensus pattemH-[LIVMl-x(2)-[LIVMl-IGSTAC)(3)-lLIVMl-[DEQ]-S-[UVMA}-|UVM](2)-(GFl.E-x-[EQR]- 

[I V]-[LIT]-[STAGJ-Q-(UVMHKR] Sequences known to bek)ng to this class detected by the pattern ALL 

[0699] [1J Jahn D., Verkamp E.. Soell D. Trends Bkjchem. Sci. 17:215-218(1992). 

[0700] 234. (Glycoprotease) 

Glycoprotease family signature (aka Peptidase_M22) 

[0701] Glycoprotease (GCP) (EC 3. 4.24.57) [1 ]. or o-syaloglycoprotein endopeptidase, is a metalloprotease secreted 
by Pasteurella haemolytica which specifically cleaves O-sialoglycoproteins such as glycophorin A. The sequence of 
GCP is highly similar to the following uncharacterized proteins: 

Escherichia coli hypothetfcal protein ygjD (ORF-X). 
Bacillus subtilis hypothetical protein ydlE. 
Mycobacterium leprae hypothetical protein U229E. 
Mycobacterium tuberculosis hypothetical protein MtCY78.10. 
Synechocystis strain PCC 6803 hypothetkial protein slr0807. 
Methanococcus jannaschii hypothetical protein MJ11 30. 
Hatoarcula marismortui hypothetical protein In HSH 3' region. 
Yeast hypothetical protein yKR038c. 
Yeast hypothetical protein QRI7. 



[0702] One of the consen/ed regions contains two consented histidines. It is possible that this region Is Involved i 
coordinating a metal kxi such as zinc. 
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Chem. 266:10429-10437(1991). 

[ 7J Fofchammer K., LeinfekJr W., Bock A. Nature 342:453-466(1 989). 
[ 8] Manavathu E.K., Hiratsuka JC, Taytor D.E. Gene 62:17-26(1988). 

[ 9) Leblanc D.J., Lee LN., Titmas B.M.. Smrlh C.J., Tenover F.C. J. Bacterbl. 170:3618-3626(1988). 

110] Cen^antes E.. Sharma S.B., Maillet R. Vfesse J., Truchet G.. Rosenberg C. Md. MicfobfoL 3:745-755(1989). 

(11] Plunkett G. Ill, Borland V.D.. Daniels D.L. Blattner RR Nucleic AckJs Res. 21:3391-3398(1993). 

[12J Moiler W.. Schipper A., Amons R. Biochimie 69:983-989(1987). 

[0681] 228. GTP cyctohydrotase II. 

GTP cyctohydrolase II catalyses the first committed step in the bbsynthesis of riboflavin. 

[0682] [1 J Richter G. Ritz H. KaUenmeler G, Volk R, Kohnle A. Lotlspeich R Allendorf D. Bacher A J Bacteriol 1993* 
175:4045-4051. 

[0683] 229. Galactose-1 -phosphate uridyl transferase signatures (GalP_UDP_transf) 

Galactose-I -phosphate uridyl transferase (EC 2.7.7.10) (galT) catalyzes the transfer of an uridy diphosphate group on 

galactose (or glucose) 1 -phosphate. During the reaction, the uridyl moiety links to a histidine reskiue. In the Escherichia 

coli enzyme, it has been shown [1] that two histidbe residues separated by a single proline reskiue are essential for 

enzyme activity. On the basis of sequence similarities, two apparently unrelated families seem to exist. Ciass-I enzymes 

are found in eukaryoles as well as some bacteria such as Escherichia coli or Streptomyces lividans. while class-tl 

enzymes have been found so far only in bacteria such as Bacillus subtilis or Lactobacillus helveticus [2]. Signature 

patterns for both families were developed. For class-l enzymes the signature is based on the active site residues. For 

class-ll enzymes a region which also includes two conserved histidines was chosen. 

Consensus pattem; F-E-N-[RK]-G-x(3)-G-x(4)-H-P-H-xO (The two H*s are the active site residuesh 

[0684] Consensus pattern: D-L.p.|.V-G^-(STJ-(LIVM](2)-[SA]-H-lDEN]-H.[FYl-Q-G-G. Note: class-l enzymes are 

structurally related to the HIT family of proteins (see <PDOC00694 

[ 1] Rerchardt J.K.V. Berg R Nucleic Ackls Res. 16:9017-9026(1988). 
[ 2] Mollet B., Piltoud N. J. Bacteriol. 173:4464-4473(1991). 

[0685] 230. Gamma-thionins family signature 

[0686] The following small plant proteins are evolutionary related: 

■ Gamma-thk>nins from wheat endosperm (gamma-purothionins) and barley (gamma- hordothk)nins) which are toxic 
to animal cells and inhibit protein synthesis in cell free systems [1 J. 
A flower-specific thionin (FST) from tobacco {2J. 

- Antifungal proteins (AFP) from the seeds of Brassicaceae species such as radish, mustard, turnip and Arabidopsis 
thaliana [3]. 

Inhibitors of insect alpha-amylases from sorghum [4]. 
Probable protease inhibitor P322 from potato. 
A germination-related protein from cowpea [5]. 

- Anther-specific protein SF18 from sunflower [6J. SF18 is a protein that contains a gamma-thionin domain at its N- 
terminus and a proline-rich C- terminal domain. 

- Soybean sulfur-rich protein SE60 [7J. 

NTicia faba antibacterial peptides fabatin-1 and -2. 

[0687] In their mature form, these proteins generally consist of about 45 to 50amino.acki residues. As shown in the 
foltowing schematic representation, these peptides contain eight consented cysteines involved in disulfide bonds. 

+ ^1,1(1 

xxCxxxxxxxxxxCxxxxxCxxxCxxxxxxxxxCxxxxxxCxCxxxC * | 

I + 1 ^ ^ 

'C: conserved cysteine involved in a disulfide bond. 
"*'; position of the pattern. 



[0688] Consensus pattem: [KRGl-x-C.x(3HSVl-x(2)-[FYWHl-x-(GFl.x-C-x(5).C-x(3)-C [The four Cs are involved ii 
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binding proteins alpha subunrts (Gl, Gs. Gt, GO, etc.). - DNA mismatch repair proteins mulS family (See 
<PDQC00388>). - Bacterial type II secretion system protein E (see <PDOC00567>).Not all ATP- or GTP-binding pro- 
teins are picked-up by this motif. A number of proteins escape detection because the structure of their ATP-binding 
site is completefy different from that of the P-kxip. Examples of such proteins are the El -E2 ATPases or the glycolytic 
kinases. In other ATP- or GTP-blnding proteins the flexible loop exists in a slightly different form; this is the case for 
tubulins or protein kinases. A special mentbn must be reserved for adenylate kinase, in which there is a single deviatkDn 
from the P-loop pattern: in the last position Gly is found instead of Ser or TTir. 

- Consensus pattern: [AG]-x(4)-G-K-[ST> 



[ 1) Walker J.E.. Saraste M.. Runswick MJ., Gay NJ. EMBO J. 1:945-951(1982). 
[ 2] Moller W., Amons R. FEBS Lett. 186:1-7(1985). 

[ 3] Fry D C., Kuby S.A.. Mildvan A.S. Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986). 
[4] DeverlE., Glynias M.J.. Merrick W.C. Proc. Natl. Acad. Sci. U.S.A. 84:1814-1818(1987). 
»5 [ 5J Saraste M., Sibbald RR., Wrttinghofer A. Trends Bkxhem. Sci. 15:430-434(1990). 

{ 6] Koonin E. V J. Mol. Biol. 229: 1 1 65-1 1 74(1 993). 

[7] HIggins C.F., Hyde S.C.. Mimmack M.M., Gileadi U. Gill D.R., Gallagher M.P J. Bk)energ Biomembr 22* 
571-592(1990). 

[8] Hodgman T.C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 

[ 9] Linder R. LaskoR, Ashburner M.. Leroy R, Nielsen RJ.. Nishi K., Schnier J., Slonimski RR Nature 337 121-122 
(1989). 

[10J Gorbalenya A.E., Koonin E.V., Donchenko A.R, Blinov V.M. Nuclek^ Acids Res. 17:4713-4730(1989). 
[0679] GTP-binding elongation factors signature (GTP_EFTU2) 

EkMigation factors [1 2] are proteins catalyzing the ekxigatbn of peptide chains in protein bk>synthesis. In both prokary- 
otes and eukaryotes. there are three distinct types of elongatk)n factors, as described in the foltowing table: 

Eukaryotes Prokaryotes Function 

EF-lalpha EF-Tu Binds GTP and an aminoacyl-tRNA; deliv- 
ers the latter to the A site of ribosomes. EF-1 beta EF-Ts Interacts with EF-la/EF-Tu to displace GDP and thus allows 
the regeneratbn of GTP-EF-la. EF-2 EF-G Binds GTP and peptidyl-tRNA and translocates the latter from the A site 

to the P site. .Tlie GTP-binding elongatton factor family also 

includes the foltowing proteins: - Eukaryotic peptide chain release factor GTP-binding subunits [3]. These proteins 
interact with release factors that bind to riboson>es that have encountered a stop codon at their decoding site and help 
them to induce release of the nascent polypeptide. The yeast protein was known as SUP2 (and also as SUP35, SUFI 2 
or GST1) and the human homolog as GSTI-Hs. - Prokaryotic peptide chain release factor 3 (RF-3) (gene prfC). RF- 
3 is a class-ll RF. a GTP-binding protein that interacts with class I RFs (see <PDOC00607 >) and enhance their activity 
[4]. - Prokaryotic GTP-binding protein lepA and its homotog in yeast (gene GUFl) and in Caenorhabditis elegans 
(ZK1236.1). - Yeast HBSl [5]. - Rat statin 81 [61 a protein of unknown function which is highly similar to EF-lalpha. - 
Prokaryotk: selenocysteine-specific elongation factor selB [7], which seenris to replace EF-Tu for the insertion of se- 
lenocysteine directed by the UGA codon. - The tetracycline resistance proteins tetMAetO (8.9] from various bacteria 
such as Campylobacter jejuni. Enterococcus faecalis, Streptococcus mutans and Ureaplasma urealyllcum. Tetracycline 
binds to the prokaryotic rtbosomal 30S subunit and inhibits binding of aminoacyMRNAs. These proteins abolish the 
inhibitory effect of tetracycline on protein synthesis. - Rhizobium nodulation protein nodQ (10). - Escherk:hla coll hy- 
pothetcal protein yihK I11].ln EF-1^lpha. a specific regton has been shown 112] to be involved in a conformational 
change mediated by the hydrolysis of GTP to GDR This region is conserved in both EF-lalpha/EF-Tu as well as EF- 
2/EF-G and thus seems typical for GTP-dependent proteins which bind non-initiator tRNAs to the ribosome. The pattern 
developed for this family of proteins Include that conserved region. 

[0680] Consensus pattern: D-[KRSTGANQFYW]-x(3)-E-lKRAQ]-x-[RKQD]-(GCHIVf^KHST]. llVJ-x(2)-[GSTACK- 
RNQ]- 

1 1] Concise Encyctopedia Bkx:hemistry, Second Edition, Walter de Gruyter. Berlin New- York (1988). 
( 2J Moldave K. Annu. Rev. Blochem. 54:1109-1149(1985). 

[ 3] StansfleW I., Jones K.M.. Kushnirov V.V.. Dagkesamanskaya A.R.. Poznyakovski A.I.. Paushkin S.V,. Nierras 
C.R.. Cox B.S., Ter-Avanesyan M.D., Tuite M.R EMBO J. 14:4365-4373(1995). 

1 4] Grentemann G., Brechemier-Baey D., Heurgue-Hamard V., Buckingham R.H. J. Biol. Chem. 270" 10595^10600 
(1995). — ' — 

[ 5] Nelson R.J., Ziegelhoffer T, NIcolet C, Werner-Washbume M.. Craig E.A. Cell 71:97-1 05(1 992V 
[ 61 Ann D.K.. Moulsatsos I.K., Nakamura T, Lin H.H., Mao R-L. Lee M.-J., Chin S.. Liem R.K.H.. W&ng E. J. Biol. 
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the significance of this relationship is not yet dear. Seleniunri, in the fonn of selenocysteine [7] is part of the catalytic 
site of GSHPx. The sequence around the selertocysteine residue is moderately wen conserved in GSHPx*s and the 
related proteins and can be used as a signature pattern. As a second signature for this family of proteins a highly 
conserved octapeptlde located in the centraJ section of these proteins was selected. 

[0672] Consensus pattern: lG^^^RKHNFYC^x^LI^^FC^(U VMF](2)-x-N-{VThx4STChx-<^ [C is the active 

site selenocysteine residue] 

Consensus pattem: [LIV]-[AGD]-F-P-[CSHNG]-Q- 

[ 1] Mannen/ik B. Meth. Enzymol. 113:490-495(1985). 

[ 2J Mullenbach G.T, Tabriz! A., Irvine B.D., Bell G.L» Tainer J.A, Hallewell RA Protein Eng. 2:239-246(1988). 
[ 3] Chu F.R. Doroshow J.H., Esworthy R.a J. Biol. Chem. 268:2571-2576(1993). 

1 4J Takahashi K., Akasaka M., Yamamoto Y., Kobayashi C, li^izoguchi J.. Koyama J. J. Biochera 108:145-148 
(1990). 

[ 5] Dunn O.K.. Howells D.D.. Richardson J., Goldfarb RS. Nuclec Acids Res. 17:6390-6390(1989). 
[ 6] Cookson E., Btaxter M.L. Selkirk M.E. Proc. Natl. Acad. Scl U.S.A 89:5837-5841(1992). 
[ 7) Stadtman T.C. Annu. Rev. Biochem. 59:111-127(1990). 

[0673] 225. (GST) 
Glutathtone S-transferases 

[0674] Function: conjugation of reduced glutathione to a variety of targets. Also included in the alignment, but are 
not GSTs S-crystallins from squid. Similarity to GST was prevbusly noted. Eukaryotrc elongation factors 1 -gamma. 
Not known to have GST activity; similarity not previously recognized. Supported by HMI^ and manual alignment in- 
spection. HSP26 family of stress-related proteins, including auxin-regulated proteins in plants and stringent starvation 
proteins in E. coli. Not known to have GST activity. Similarity not prevbusly recognized. Supported by HMM and manual 
alignment inspectbn. Alignment spans entire protein. 
[0675] 226. GTP1/OBG family signature 

A wndespread family of GTP-binding proteins has been recently characterized [1 ,2]. This family currently includes: - 
Mouse and Xenopus protein DRG. - Human protein DRG2. - Drosophila protein 128up. - Fission yeast protein gtpl. - 
A Halobacterium cutinjbrum hypothetical protein In a ribosomal protein gene cluster. - Bacillus subtilis protein obg. 
Obg has been experimentally shown to bind GTP. - Escherichia coli hypothetical protein yhbZ. - Haemophilus influenzae 
hypothetbal protein HI0877. - Mycoplasma genitalium hypothetical protein MG384. - Yeast hypothetbal protein 
YAL036C (FUN11). - Yeast hypothetical protein YGR173w. - Caenorhabditis elegans hypothetical protein C02F5.3.The 
f unctbn of the proteins that betong to this family is not yet knovm. They are polypeptides of about 40 to 48 Kd Vitibh 
contain the five small sequence elements characteristic of GTP-binding proteins [3]. As a signature pattem the region 
that correspond to the ATP/GTP B motif (also called G-3 inGTP-binding proteins) was selected. 
[0676] Consensus pattem: D-fLIVM]-P-G-[LIVMl(2)-(DEYl-|GN]-A-x(2)-G-x-G - 

[ 1] Sazuka T, Tomooka Y, Ikawa Y, Noda M.. Kumar S. Biochem. Biophys. Res. Commun. 189:363-370(1992). 

[ 2) Hudson J.D., Young PG. Gene 125:191-193(1993). 

1 3] Bourne H.R.. Sanders D.A.. McCormbk F Nature 349:117-127(1991). 

[0677] 227. (GTP.EFTUI) 
ATP/GTP-blnding site motif A (P-loop) 

[0678] From sequence comparisons and crystallographic data analysis it has been shown J1 ,2,3,4,5,6] that an ap- 
preciable proportion of proteins that bind ATP or GTP share a number of more or less conserved sequence motifs. 
The best conserved of these motifs is a glycine-rich regbn. which typically forms a flexible loop between a beta-strand 
and an alpha-helix. This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is 
generally referred to as the 'A' consensus sequence [1 ] or the 'P-loop* [5] There are numerous ATP- or GTP-binding 
proteins in which the P-toop is found. Listed below are a number of protein families for which the relevance of the 
presence of such motif has been noted: - ATP synthase alpha and beta subunrts (see <PDOC00137>). - Myosin heavy 
chains. - Kinesin heavy chains and kinesin-like proteins (see <PDOC00343 >). - Dynamins and dynamin-like proteins 
(see <PDOC00362>). - Guanylate kinase (see <PDOC00670>). - Thymidine kinase (see <PDOC00524>). - Thymi- 
dylate kinase (see <PDOC01034>). - Shikimate kinase (see <PDOC00868>). - Nitrogenase iron protein famity (nifH/ 
frxC) (see <PDOC00580>). - ATP-binding proteins involved in 'active transport* (ABC transporters) [7J (see 
<PDOC0gi85>). - DNA and RNA helbases [8.9.10). - GTP-binding elongation factors (EF-Tu. EF-lalpha. EF-G. EF- 
2. etc.). - Ras family of GTP-binding proteins (Ras. Rho. Rab. Ral. Ypt1, SEC4. etc.). - Nuclear protein ran (see 
<PDOCg0859>). - ADP-ribosylatbn factors family (see <PIX)C0Q781 >). • Bacterial dnaA protein (see <PDOC00771 >). 
- Bacterial recA protein (see <PDOC00131 >>. - Bacterial recF protein (see <PDOC00539>). - Guanine nucleotide- 
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similarities two classes of GATase domains have been identified (2,3): class-J (also known as trpG-type) and class-ll 
(also known as purF-type). Class-I GATase domains have been found In the following enzymes: 

- The second component of anthranilate synthase (AS) (EC 4.1 .3.27) [4]. AS catalyzes the bkjsynthesls of anthra- 
nllate from chorismate and glutamlne. AS is generally a dimeric enzyme: the first component can synthesize an- 
thranilate using ammonia rather than glutamlne, whereas component II provides the GATase activity. In some 
bacteria and in fungi the GATase component of AS Is part of a multifunctional protein that also catalyzes other 
steps of the biosynthesis of tryptophan. 

. The second component of 4-amino-4-deoxychorismate (ADC) synthase (EC 4. 1 .3. -), a dimeric prokaryotic enzyme 
that f unctkm in the pathway that catalyzes the biosynthesis of para-amlnobenzoate (PABA) from chorismate and 
glutamlne. The second component (gene pabA) provkles the GATase activity [4]. 

• CTP synthase (EC 6.3.4.2). CTP synthase catalyzes the final reaction in the biosynthesis of pyrimldlne, the ATP- 
dependent fomnation of CTP from UTP and glutamlne. CTP synthase is a single chain enzyme that contains two 
distinct domains; the GATase domain is In the C-termlnal section [2). 

• GMP synthase (glutamine-hydrolyzing) (EC 6.3.5.2). GMP synthase catalyzes the ATP-dependent formation of 
GMP from xanthosine 5*-phosphate and glutamlne. GMP synthase is a single chain enzyme that contains two 
distinct domaw^s; the GATase domain is in the N-terminal section [5]. 

• Glutamine-dependent carbamoyl-phosphate synthase (EC 6.3.5.5) (GD-CPSase); an enzyme Involved in both 
arglnine and pyrlmidine biosynthesis and which catalyzes the ATP-dependent formation off carbamoyl phosphate 
Jrom glutamlne and carbon dioxide. In bacteria GD-CPSase is composed of two subunits: the large chain (gene 
carB) provides the CPSase activity, white the small chah (gene carA) provkJes the GATase activity. In yeast the 
enzyme Involved in arginlne biosynthesis is also composed of two subunits: CPA1 (GATase). and CPA2 (CPSase). 
In most eukaryotes. the first three steps of pyrlmidine biosynthesis are catalyzed by a large multifunctional enzyme 
(called URA2 in yeast, rudimentary in Drosophila. and CAD in mammals). The GATase domain is located at the 
N-terminal extremity of this polyprotein [6]. 

- Phosphorlbosylformylglycinamidine synthase 11 (EC 6.3.5.3). an enzyme that catalyzes the fourth step in the de 
novo biosynthesis of purines. In some species of bacteria, FG AM synthase li is composed of two subunits: a small 
chain (gene purQ) whfch provides the GATase activity and a large chain (gene puri_) which provides the aminator 
activity. 

• The histidlne amidotransferase hisH. an enzyme that catalyzes the fifth step in the biosynthesis of histidlne in 
prokaryotes. 



[06681 »n the second component of AS a cysteine has been shown [7] to be essential for the amidotransferase activity. 
The sequence around this residue is well consented in all the above GATase domains and can be used as a signature 
pattern for class-l GATase. 

[0669] Consensus pattem[PAS]-[LIVMFYT]-[LIVMFY]-G-[LlVMFY]-C-ILIVMFYN]-G-x-[QEHl- x-[UVMFAJ [C is the 
active site residue] Sequences known to belong to this class detected by the pattern ALL. except for 6 sequences. 
[0670] Note: In the first position of the pattern Pro is found in all cases except in the slime mold GD-CPSase where 
it is replaced by Ala. 



[ 1] Buchanan J.M. Adv. Enzymol. 39:91-183(1973). 

1 2] Weng M.. Zaikin H. J. Bacteriol. 169:3023-3028(1987). 

[ 3] Nyunoya H., Lusty C.J. J. Biol. Chem. 259:9790-9798(1984). 

[ 4] Crawford LP Annu. Rev. Microbiol. 43:667-600(1989). 

1 5] ZaIkIn H.. Argos P. Narayana S.V.L, Tiedeman A.A.. Smith J.M. J. Bfol. Chem. 260:3350-3354(1985). 
( 6] Davidson J.N., Chen K.C.. Jamison R.S., Musmanno LA., Kem C.B. BioEssays 15:157-164(1993). 
[ 7] Tso J.Y., HeroKxJson M.A.. Zaikin H. J. Biol. Chem. 255:1451-1457(1980). 

[0671] 224. Glutathione peroxidases signatures (GSHPx) 

Glutathione peroxkiase (EC 1.11.1.9) (GSHPx) (1.2) Is an enzyme that catalyzes the reduction ot hydroxyperoxides 
by glutathkjne. Its main function Is to protect against the damaging effect of endogenously formed hydroxyperoxides. 
In higher vertebrates at least four forms of GSHPx are known to exist: a ubquitous cytosolic form (GSHPx-1), a gas- 
trointestinal cytosolic for (GSHPx-GI) [3], a plasma secreted form (GSHPx-P) [4). and a epidldymal secretory form 
(GSHPx-EP). In addition to these characterized forms, the sequence of a protein of unknown function (5J has been 
shown to be evolutk>nary related to those of GSHPx's. In filarial nematode parasites such as Brugia pahangi the major 
soluble cuticular protein, known as gp29. is a secreted GSHPx whch couW provide a mechanism of resistance to the 
immune reactton of the mammalian host by neutralizing the products of the oxkJative burst of leukocytes [SJ.Escherlchia 
coli protein btuE. a periplasmic protein Involved in the transport of vitamin B12. is also evolutionary related to GSHPx's; 
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Gtu / Leu / Phe / dehydrogenases active site 

- Glutamate dehydrogenases (EC 1.4.1.2. EC 1. 4. 1.3, and EC1. 4.1.4) (GluDH) are enzymes that catalyze the NAD- 
or NADP-dependent reversible deamination of glutamate mto alpha-ketoglutarate [1 .2J. GluDH isozymes are gen- 
erally involved with either ammonia assimilation or glutamate catabolisra 

- Leucine dehydrogenase (EC 1 .4.1 .9) (LeuDH) is a NAD-dependent enzyme that catalyzes the reversible deami- 
nation of leucine and several other al?)hatic amino acids to their keto analogues [3]. 

- Phenylalanine dehydrogenase (EC 1 .4. 1 .20) (PheDH) is a NAD-dependent enzyme that catalyzes the reverstole 
deamidation of L-phenylalanine into phenytpyruvate [4]. 

- Valine dehydrogenase (EC 1.4.1.8) (\felDH) Is a NADP-dependent enzyme that catalyzes the reversible deami- 
dation of L-valine into 3-methyl-2-oxobutanoate [5]. 

[0661] These dehydrogenases are structurally and functionally related. A consented lysine residue located in a gly- 
cine-rich region has been implicated in the catalytic mechanism. The consen/atlon of the region around this residue 
albws the derivation of a signature pattern for such type of enzyme& 

[0662] Consensus pattem[UVhx(2)-G-G-[SAG]-K-x-[GV]-x(3)-[DNSTHPLl [K Is the active site residue) Sequences 
krK>wn to belong to this class detected by the pattern ALL 

[0663] Note all known sequences from this family have Pro in the last position of the pattern with the exception of 
yeast GluDH which as Leu. 

[ 1 J Britlon K.L. Baker P.J., Rtee D.W., Stillman TJ. Eur. J. Biochem. 209:851-859(1992). 
1 2] Benachenhou-Lahfa N., Forterre P, Labedan B. J. Mol. Evd. 36:335-346(1993). 

[ 3] Nagata S.. Tanizawa K., Esaki N., Sakamoto Y, Ohshima T. Tanaka H., Soda K. Biochemistry 27:9056-9062 
(1988). 

[ 4] Takada H., Yoshimura T, Ohshima T» Esaki N.. Soda K. J. Biochem. 109:371-376(1991). 
[ 5J Hutchinson C.R., Tang L J. Bacteriol. 175:4176-4185(1993). 

[0664] 222. GMC oxidoreductases signatures 

The following FAD flavoproteins oxidoreductases have been found [1,2] to be evolutionary related. These enzymes, 
which are called 'GMC oxidoreductases', are listed below. - Glucose oxidase (EC 1.1.3.4) (GOX) from Aspergillus niger 
Reaction catalyzed: glucose + oxygen -> delta-gtuconotactone + hydrogen perc»cide. - Methanol oxidase (EC 1.1.3.13> 
(MOX) from fungi. Reaction catalyzed: methanol + oxygen -> acetakiehyde + hydrogen peroxide. - Choline dehydro- 
genase (EC 1.1.99.1) (CHD) from bacteria Reaction catalyzed: choline + unknown acceptor -> betaine acetaldehyde 
+ reduced acceptor. - Glucose dehydrogenase (GLD) (EC 1.1.99.10) from Drosophila. Reaction catalyzed: glucose + 
unknown acceptor -> delta-gluconolactone + reduced acceptor. - Cholesterol oxidase (CHOD) (EC 1.1.3.6 ) from Brevi- 
bacterium steroiicum and Streptomyces strain SA-COO. Reaction catalyzed: cholesterol + oxygen -> cholest-4-en- 
3-one + hydrogen peroxide. - AlkJ [3], an alcohol dehydrogenase from Pseudomonas oleovorans, which converts 
aliphatic medium-chain-length akX)hols into akJehydes. This family also includes a lyase: - (R)-mandelonitrile lyase 
(EC 4.1.2.10) (hydroxynitrile lyase) from plants [4], an enzyme involved in cyanogenis, the release of hydrogen cyanide 
from injured tissues. These enzymes are proteins of size ranging from 556 (CHD) to 664 (MOX) amino acid residues 
whkJh share a number of regions <^ sequence similarities. One of these regions, located in the N-terminal section, 
corresponds to the FAD ADP-binding domain. The function of the other conserved domains is not yet known; two of 
these domains were selected as signature patterns. The first one is kx:ated in the N-terminal section of these enzymes, 
about 50 residues after the ADP-binding domain, while the second one is located in the central section. 
[0665] Consensus pattern: IGAHRKNhx-{LIV]-G(2)-fGST|(2)-x-[LIVM]-N-x(3)-(FYWAh x(2)-|PAG]-x(5)'IDNESH}- 
Consensus pattern: [GSHPSTA]-x(2)-[STl-P-x-{LIVM)(2)-x(2)-S-G-|UVM]-G- 

[ 1) Cavener D R. J. Mol. Biol. 223:811-814(1992). 

[ 2] Henikoff S., Henikoff J.G. Genomics 19:97-107(1994). 

( 3] van Beilen J.B.. Eggink G., Enequist K, Bos R., Witholt B. Mol. Microbiol. 6:3121-3136(1992). 
[ 4] Cheng I P, Poutton J.E. Plant Cell Physiol. 34:1139-1143(1993). 

[0666] 223. (GMP_synt_C) 

Glutamine amidotranst erases class-l active site 

[0667] A large group of biosynthetic enzymes are able to catalyze the removal of the ammonia group from glutamine 
and then to transfer this group to a substrate to form a new cartxyi-nitrogen group. This catalytic activity is known as 
glutamine amkJotransferase (GATase) (EC 2.4.2.-) [1]. The GATase domain exists either as a separate polypeptidic 
subunit or as part of a larger polypeptide fused in different ways to a synthase domain. On the basis of sequence 
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pattern for class-l GATase.- 

[0648] Consensus panem: IPASHLI VMFY'n-[U VMFY^G-ILIVMFY1-(^[LI VMFYN]-G-x4QEH]- x^LI VMFA] [C is the 
active site residue]- 

[ 1] Buchanan J.M. Adv. Enzymol. 39:91-183(1973). 

[ 2] Weng M.. Zaikin H. J. Bacteriol. 169:3023-3028(1987). 

[ 3] Nyuncya H., Lusty C.J. J. Biol. Cham. 259:9790-9798(1984). 

[ 4) Crawford I.P. Annu. Rev. Microbiol. 43:567-600(1989). 

[ 5) Zaikin H.. Argos R, Narayana S.V.L. Tiedenr>an A.A., Smith J.M. J. Btol. Chem. 260:3350-3354(1985). 
[ 6] DavWson J.N., Chen K.C.. Jamison R.S.. Musmanno LA., Kem C.B. BioEssays 15:157-164(1993). 
[ 71 Tso J.Y., Henmodson M.A.. Zaikrn H. J. Biol. Chem. 255:1451-1457(1980). 

[0649] 216. Glutamine amidotransf erases class-ll active site (GATase_2) 

A large group of biosynthetic enzymes are able to catalyze the removal of the ammonia group from glutamine and then 
to transfer this group to a substrate to form a new carbon-nitrogen group. This catalytk: activity is known as glutamine 
amidotransferase (GATase) (EC 2.4.2.-) {IJ. The GATase domain exists either as a separate polypeptidic subunit or 
as part of a larger polypeptide fused in different ways to a synthase domain. On the basis of sequence similarities two 
classes of GATase domains have been identified (2,3): class-l (also known as trpG-type) and class-ll (also known as 
purF-type). Class-ll GATase domains have been found in the following enzymes: - Amido phosphoribosyltransferase 
(glutamine phosphorlbosylpyrophosphate amidotransferase) (EC 2.4.2.14) . An enzyme whfch catalyzes the first step 
in purine biosynthesis, the transfer of the ammonia group of glutamine to PRPP to fomfi 5-phosphoribosylamine (gene 
purF in bacteria, ADE4 in yeast). - Glucosamine->fructose-6-phosphate aminotransferase (EC 2.6.1.16 ). This enzyme 
catalyzes a key reaction in amino sugar synthesis, the fomiation of glucosamine 6-phosphate from fmctose 6-phos- 
phate and glutamine (gene glmS in Escherichia coli. nodM in Rhizobium, GFA1 in yeast) - Asparagine synthetase 
(glutamine-hydrolyzing) (EC 6.3.5.4). This enzyme is responsible for the synthesis of asparagine from aspartate and 
glutamine. A cysteine is present at the N-terminal extremity of the mature form of all these enzymes. The cysteine has 
been shown, in amklo phosphoribosyltransferase [4] and in asparagine synthetase [5] to be important for the catalytic 
mechanism. 

[0650] Consensus pattern: <x(0, 1 1 )-C-IGSJ-II V1-[U Vf^FYWJ-f AG] [C is the active site residue]- 

[ II Buchanan J.M. Adv. Enzymol. 39:91-183(1973). 

1 2] Weng M., Zaikin H. J. Bacteriol. 169:3023-3028(1987). 

( 3) Nyunoya H., Lusty C.J. J. Biol. Chem. 259:9790-9798(1 984). 

[4] van Heeke G., Schuster M. J. Bbl. Chem. 264:5503-5509(1989). 

[ 5) Vollmer S.J., Switzer R.L. Hemiodson M.A., Bower S.G.. Zaikin H. J. Bk>l Chem. 258:10582-10585(1983). 
[0651] 217. GDP dissociation inhibitor (GDI) 

[0652] [1) Schalk I, Zeng K, Wu SK. Stura EA. Matteson J, Huang M, Tandon A. Wilson I A, Baich WE Nature 1996 
381:42-48. 

[0653] 218. Oxidoreductase family (GFOJDH_MocA) 

[0654] This family of enzymes utilise NADP or NAD. This family: is called the GFO/IDH/MOCA family in swiss-prol. 
[0655] [1] Kingston RL. Scopes RK, Baker EN, Structure 1996;4:1413-1428. 
[0656] 21 9. GHMP kinases putative ATP-binding domain 

The foltowing kinases contains, in their N4erminal section, a conserved Gly/Ser-rich region whrch is probably involved 
in the binding of ATP [1]. These kinases are listed below. - Galactokinase (EC 2.7. 1.6 ). - Homoserine kinase (EC 
2^139). - Mevalonate kinase (EC 2.7.1.36 ). - Phosphomevatonate kinase (EC 2.7.4.2 V This group of kinases was 
called 'GHMP' (from the first letter of their substrate) 

Consensus pattern: [LIVMl-[PK]-x-[GSTAl-x(0,l )-G-L-[GS]-S-S-[GSA)-[GSTACl- 
[0657] { 1] Tsay Y.H., Robinson G.W. Mol. Cell. Bk>l. 11:620-631(1991). 
[0658] 220. Glucose inhibited divisbn protein A family signatures (GIDA) 

Bacterial glucose inhibited division protein A (gene gidA) is a protein of 70Kd whose function is not yet known and 
whose sequence is highly conserved. It is evolutionary related to yeast hypothetical protein YGl^36C, Caenorhabditis 
slogans hypothetical protein F52Ha2 and a Bacillus subtilis protein called gid (and which is different from B.subtilis 
gidA). Two highly consen/ed regions were selected as signature patterns. Both regk>ns are located in the central region 
of the protein. 

[0659] Consensus pattern: (GSl-[PT]-x-Y-C-P-S-[LIVM]-E-x-K-[LIVM]-x-[KR]- 

Consensus pattern: A-G-Q-x-[NT]-G-x(2)-G-Y-x-E-|SAG](3)-[QSJ-G-[LIVM](2)-A-G-ILIVMT]-N-A- 
[0660] 221. (GLFV^dehydrog) 
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of genes containing the GATA region, including vitellogenin genes (6). - Ustilago maydis urbsl [7], a protein involved 
in the repression of the biosynthesis of siderophores. - Fission yeast protein G AF2, All these transcription factors contain 
a pair of highly sinriilar "zirrc finger* type domains with the consensus sequerrce C-x2-C-x17-C-x2-C.Sofne other proteins 
contain a single zinc finger motif highly related to those of the GATA transcription factors. These proteins are: - Dro- 
sophila box A-bindhg factor (ABF) (also krK>wn as proteki serpent (gene srp)) which may function as a transcriptional 
activator protein and may play a key rote in the organogenesis of the fat body. - Emericella nidulans areA {8], a tran- 
scriptional activator which mediates nitrogen metabolite repression. - Neurospora crassa nit-2 (9), a transcriptional 
activator which turns on the expression of genes coding for enzymes required for the use of a variety of secondary 
nitrogen sources, during conditions of nitrogen limitatioa - Neurospora crassa white collar proteins 1 and 2 (WG-1 and 
WC-2), which control expression of light-regulated genes. - Saccharomyces cerevisiae DALB1 (or UGA43), a negative 
nitrogen regulatory protein. - Saccharomyces cerevisiae GIJM3. a positive nitrogen regulatory protein. - Saccharomyces 
cerevisiae GAT1 . - Saccharomyces cerevisiae GZF3. 

[0646] Consensus pattern: C-x-[DNhG-x(4,5HSTl-x(2hW-{HRHRKpx{3)-{GNl-x(3.4)- C-N-IASJ-C [The four C's are 
zinc ligands] 

[ 1] Trainor C.D.. Evans Feisenfeld G., Boguski M.S. Nature 343:92-96(1990). 

[ 2] Lee M.E., Temizer D.T, Clifford J. A.. Quertermous T J. BtoL Chem. 266:16168-16192(1991). 

[ 3) Ho l.-C., Vorhees P.. Marin N.. Oakley B.K.. Tsai S.-F., Origin S.H., Leiden J.M EMBO J. 10:1187-1192(1991). 

[ 4) Spieth J.. Shim YH.. Lea IC, Conrad R., Blumenthal T Mol. Cell. Biol. 11:4651-4659(1991). 

[ 5] Drevet J.R, Skeiky Y.A. latrou K. J. Biol, Chem. 269:10660-10667(1994). 

[ 6] Hawkins M.G.. McGhee J.D. J. Bk>l Chem. 270:14666-14671(19951 

[ 7] Voisard C.P.O.. Wang J.. Xu P.. Leong S.A., l^cEvoy J.L MoL CeO. Biol. 13:7091-7100(1993). 

[ 8] Arst H.N. Jr., Kudla B.. Martinez-Rossi N.M.. Caddick M.X., Sibley S.. Davies R.W. Trends Genet 5 291-291 

(1989). 

[ 9] Fu Y.-H., Marzluf G.A. Mol. Cell. BbL 10:1056-1065(1990). 
[0647] 215. Gtutamrne amidotransf erases class-l active site (GATase) 

A targe group erf bk)synthetic enzymes are able to catalyze the removal of the ammonia group from gtutamine and then 
to transfer this group to a substrate to form a new carbon-nitrogen group. This catalytic activity is known asglutamine 
amkk>transf erase (GATase) (EC 2.4.2.-) [1]. The GATase domain exists either as a separate polypeptidic subunit or 
as part of a larger polypeptide fused in different ways to a synthase domain. On the basis of sequence similarities two 
classes of GATase domains have been identified [2,3]: class-l (also known as trpG-type) and class-ll (also known as 
purF-type). Class-l GATase domains have been found in the fottowing enzymes: - The second component of anthra- 
nilate synthase (AS) (EC 4.1.3.27 ) [4]. AS catalyzes the bbsynthesis of anthranilate from chorismate and glutamine. 
AS is generally a dimeric enzyme: the first component can synthesize anthranilate using ammonia rather than 
glutamine, whereas component II provkies the GATase activity. In some bacteria and in fungi the GATase component 
of AS is part of a multifunctional protein that also catalyzes other steps of the biosynthesis of tryptophan. - The second 
component of 4-amino-4-deoxychorismate (ADC) synthase (EC 4.1.3. -). a dimeric prokaryotc enzyme that function 
in the pathway that catalyzes the bk>synthesis of para-aminobenzoate (PABA) from chorismate and glutamine. The 
second component (gene pabA) provWes the GATase activity [4]. - CTP synthase (EC 6.3.4.2) . CTP synthase catalyzes 
the final reactk)n in the bk>synthesis of pyrimkfine, the ATP-dependent formatk)n of CTP from UTP and glutamine. CTP 
synthase is a single chain enzyme that contains two distinct domains; the GATase domain is in the C-terminal sectkxi 
[2]. - GMP synthase (glutamine-hydrolyzing) (EC 6.3.5.2 ). GMP synthase catalyzes the ATP-dependent formatbn of 
GMP from xanthosine 5*-phosphate and gtutamine. GMP synthase is a single chain enzyme that contains two distinct 
domains; the GATase domain is in the N-terminal sectton [5]. - Glutamine-dependent carbamoyl-phosphate synthase 
(EC 6.3.5.5> (GD-CPSase); an enzyme involved in both arginine and pyrimkiine bkDsynthesis and which catalyzes the 
ATP-dependent formation of carbamoyl phosphate from glutamine and carbon dbxide. In bacteria GD-CPSase is com- 
posed of two subunils: the large chain (gene carB) provbes the CPSase activity^ while the small chain (gene carA) 
provides the GATase activity. In yeast the enzyme involved in arginine biosynthesis is also composed of two subunits: 
CPA1 (GATase), and CPA2 (CPSase). In most eukaryotes. the first three steps of pyrimkiine bbsynthesis are catalyzed 
- by a large multif unctkxial enzyme (called URA2 ki yeast rudimentary in Drosophila. and CAD in mammals). The GA- 
Tase domain is kxated at the N-terminal extremity of this polyprotein [6]. - 

Phosphoribosylformylglycinamfcfine synthase II (EC 6.3.5.3 ). an enzyme that catalyzes the fourth step in the de novo 
biosynthesis of purines. In some species of bacteria, FGAM synthase II is composed of two subunits: a small chain 
(gene purQ) which provides the GATase activity and a large chain (gene purL) which provides the aminator activity. - 
The histkiine amidotransferase hisH, an enzyme that catalyzes the fifth step in the biosynthesis of histkJine in prokary- 
otes.ln the second component of AS a cysteine has been shown [7J to be essentialfor the amkjotransf erase activity. 
The sequence around this residue is well consented in all the above GATase domains and can be used as a signature 
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members: - Mammalian tub, an hydrophilic protein of about 500 residues, which could be involved in the hypothalamic 
regulation of body weight. - Human protein TULPI (3) which may be involved in retinis pigmentosa 14, a retinal de- 
generation disease. - Mouse protein p4-6 whose function is not known. • Caenorhabditis elegans hypothetical protein 
F10B5.4. - Several fragmentary sequences from plants, Drosophila and human ESTs. While the N-terminal part of 
these protein is not conserved in length nor in the sequence, the C-terminal 250 residues are highly consen/ed. There- 
fore, two regions were selected in the C-temiinal part as signature patterns. The secondr egion is located at the C- 
terminal extremity and contains a penultimate cysteine residue that could be critical to the homial functioning of these 
proteins. 

Consensus pattern: F-[KHQ]-G-R-V-(ST]-x-A-S-V-K-N-F-Q 

Consensus pattern: A-F-{AG]-l-[SAC]-[UVM]-[ST)-S-F-x-[GST]-K-x-A-C-E 

{ 1] Kleyn RW.. Fan W.. Kovats S.G.. Lee J.L.. Pulido J.C„ Wu Y. Berkemeier LR.. Misumi D.J., Holmgren L. Chariat 
O.. Woolf E.A.. Tayber O., Brody T.. Shu R, Hawkins R. Kennedy B., BakJini L. Ebeling C, Alperin G.D., Deeds J., 
l^key N.D.. Culpepper J., Chen H„ Gluecksmann-Kuis M.A., Carlson G.A.. Duyk G.M., Moore K.J. Cell 85:281 -290 
(1 996).1 2) Noben-Trauth K.. Naggert J.K.. North MA , Nishina RM. Nature 380:534-538(1 996)1 3] North M. A., Naggert 
J.K., Yan Y. Noben-Trauth K„ Nishina RM. Proc. Natl. Acad Sci. U.S.A. 94:3128-3133(1997), 
[1 533] 651 . Eukaryotic DNA topoisomerase I active site 

DNA topoisomerase I (EC 5.99.1.2 ) [1.2,3,4,E1J is one of the two types of enzyme that catalyze the interconverslon 
of topological DNA isomers. Type Itopoisomerases act by catalyzing the transient breakage of DNA, one strand at a 
time, and the subsequent rejoining of the strands. When a eukaryotic type Itopoisomerase breaks a DNA backbone 
bond, it simultaneously forms a protein-DNA link where the hydroxyl group of a tyrosine residue is joined to a 3'- 
phosphate on DNA, at one end of the enzyme-severed DNA strand. In eukaryotes and pox virus topoisomerases I. 
there are a number of conserved residues in the regbn around the active site tyrosine. 
Consensus pattern: [DEN]-x(6)-IGS]4IT]-S-K-x(2)-Y-[U VM]-x(3)-[U VM] (Y is the active site tyrosme] 
1 1 1 Sternglanz R, Curr. Opin. Cell Biol. 1 :633-535(1 990).(2] Sharma A.. Mondragon A. Curr. OpIn. Struct. Biol. 5:39-47 
(1995).t3] Lynn R.M., Bjomsti M.-A.. Caron RR.. V\feing J.C. Proc. Natl, Acad. Sci. U.S.A. 86:3559-3563(1 989).[ 4] 
Roca J. Trends Biochem. Sci. 20:1 56-160(1 995).(E11 
[1 534] 652. Transaldolase signatures 

Transaldolase (EC 2.2.1.2 ) catalyzes the reversible transfer of a three-carbonketol unit from sedoheptutose 7-phos- 
phate to glyceraldehyde 3-phosphate to fomi erythrose 4-phosphate and fructose 6-phosphate. This enzyme, together 
with transketolase, provkJes a link between the glycolytic and pentose-phosphate pathways, Transaldolase is an en- 
zyme of about 34 Kd whose sequence has been well consented throughout evolutron. A lysine has been implicated 
(1 )in the catalytic mechanism of the enzyme; it acts as a nucleophilic group that attacks the carbonyl group of f ructose- 
6-phosphate.Transaldolase is evolutionary related (2] to a bacterial protein of about 20Kd (known as talC in Escherichia 
coli), whose exact function is not yet known. Two signature patterns have been devek)ped for these proteins. The first, 
located in the N-terminal section, contains a perfectly conserved pentapeptkie; these cond. includes the active site 
lysine. 

Consensus pattern: [DG]-{I VSA]-T-(ST]-N-P-[STA]-[LIVMF](2) 

Consensus pattern: [LIVM)-x-[LIVM]-K-[LIVM]-(PAS]-x-{ST]-x-[DENQPASl-G-[LIVM]-x-[AGV]-x-IQEKRST]-x-{LIVM] 
[K is the active site residue] 

[ 1] Miosga T, Schaaff-Gerstenschlaeger I.. Franken E.. Zimmermann RK. Yeast 9: 1241-1 249(1 993).[ 2] Reizer J.. 

Reizer A., SaierM.H, Jr. Microbralogy 141:961-971(1995). 

[1535] 653. (Transpeptklase) Penrciilin binding protein transpeptidase domain 

[1536] The active site serine (residue 337 in Swiss: PI 4677) is conserved in all members of this family, 

[1537] (1] Pares S, Mouz N, Petillot Y, Hakenbeck R, Dideberg O Nat Stmct Bbl 1996,3:284-289. 

[1538] 654. Trehalase signatures 

Trehalase (EC 3.2.1.28 ) is the enzyme responsible for the degradatksn of the disaccharide alpha, alpha-trehatose 
yielding two glucose subunits [1]. It is an enzyme found in a wkle variety of organisms and whose sequence has been 
highly consented throughout evolution. Two of the most highly consen/ed regions have been selected as signature 
pattems. The first pattern is located in the central section, the second one is in the C-terminal region. Consensus 
pattern: P-G-G-R-F-x-E-x-Y-x-W-D-x-Y 
Consensus pattern: Q-W-D-x-P-x-[GA]-W-(PASJ-P 

[ 1] Kopp M,. Mueller H., Holzer H. J. Biol. Chem. 268:4766-4774(1 993).( 2] Henrissat B., Bairoch A. Biochem. J 293- 
781-788(1993).[E11 

[1539] 655. Trehaiose-6-phosphate synthase domain 

[1 540] Ots A (Trehalose-6-phosphate synthase) is homologous to regions in the subunits of yeast trehak>se-6-phos- 
phate synthase/phosphate complex, (1]. 

[1541] [1] Kaasen I, McDougall J. Strom AR; Gene 1994;145:9-15. 
[1542] 656. Tropomyosins signature 
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Pseudomonas denitrificans (cobA) or Melhanobacterium h/anovii (gene cor A) SUMT is a protein ol about 25 to 30 Kd. 
In Escherichia coli and related bacteria, the cysG protein, which is involved in the biosynthesis of siroheme, is a mul- 
tif unctionat protein composed of a N-tenmtnal domain, probably involved in transforming precorrin-2 into siroheme, arKJ 
a C-tenmtnal domain which has SUMT activity. The sequence of SUMT is related to that of a number dl P. denitrificans 

s and Salmonella typhimurium enzymes mvotved in the biosynthesis of cobaiamin which also seem to be S AM-dependent 
methyltransferases (3.4]. The similarity is especially strong with two of these enzymes: coblA:biL which encodes S- 
adenosyl-L-methionine--precorrin-2 mettiyltransferase and cobM/cbiF whose exact functbn is not known. Two signa- 
ture patterns have been developed for these enzymes. The first corresponds to a well conserved region in the N- 
terminai extremity (called region 1 in (1.3]) and the secorKt to a less conserved region located in the central part of 

10 . these proteins (this pattem spans what are called regions 2 and 3 in (1 .3]). 

Consensus pattern: {LlVM]-(GS]-{STAL]-G-P-G-x(3)-(LIVMFY)-{UVM}-T-(UVM]-{KRHQG]-{AGl 

Consensus pattem: V-x(2)KU]-x(2)-G-D-x(3HFYWl^GS]-x(8HU\/FJ-x(5,6)-{U\^FYWPACl-x-(LI\^ 

( 1] Blanche P., Robin C. Couder M.. Faucher D,. Cauchois L, Cameron B., Crouzet J. J. Bactertol. 173:4637-4645 

(1991).( 2J Robin C. Blanche F.. Cauchois L, Cameron B.. Couder M., Crouzet J. J. Bacteriol. 173:4893-4896(1991). 

IS 1 3] Crouzet J.. Cameron B.. Cauchois L, Rigault S.. Rouyez M.-C., Blanche P.. Thibaut D„ Debussche L. J. Bacteriol. 
1 72:5980-5990(1 990).( 4] Roth J.R.. Lawrence J.G.. Rubenfield M., Kieffer-Higgm S.. Church G.M. J. Bacteriol. 175: 
3303-3316(1993).(5] Mattheakis LC. Shen W.H.. Collier R.J. MoL Cell, Biol. 12:4026-4037(1992). 
[1 528] 646. Tudor domain 

Domain of unknown function present in several RNA-binding proteins, copies in the Drosophila Tudor protein. Slight 
20 ambiguities in the alignment.Number of nnembers: 18 

[IjMedline: 97200561 Tudor donriains in proteins that interact with RNA. Ponting CP; Trends Bkx^hem Sci 1997.22: 

51-52. {2]Medllne: 97157029 The human EBNA-2 coactivator pi 00: multidomain organizatkjn and relationsh?) to the 

staphylococcal nuclease fold and to the tudor protein involved in Drosophila melanogaster development. Callebaut I, 

Momon JP; Biochem J 1997;321:125-132. 
2S [1529] 647. Terpene synthase family 

It has been suggested that this gene family be designated tps (for terpene synthase) (1]. It has been split into six 

subgroups on the basis of phykjgeny. called tpsa-tpsf. tpsa includes vetispiridiene synthase Swiss039979. 5-epi- 

aristolochene synthase. Swiss:Q40577 and (+)-delta-cadinene synthase Swiss:P93665. 

tpsb includes {-)-limonene synthase, Swiss:Q40322. 
30 tpsc includes kaurene synthase A, Swiss:004408. 

tpsd includes taxadiene synthase. SwissXXI 594. pinene synthase. 

Swiss:024475 and myrcene synthase, Swiss:024474. 

tpse includes kaurene synthase B. 

tpsf includes linalool synthase. 
35 Number of members: 51 

Medline: 9741 3772 

Monolerpene synthases from grand fir (Abies grandis). cDNA isolatton. characterizatkxi, and functional expression of 
myrcene synthase, (-)-(4S)-ltfnonene synthase, and (-)-{1S,5S)-pinene synthase. 
<o Bohlmann J. Steele CL. Croteau R; 

J Bk>l Chem 1997;272:21784-21792. 
[1S30] 648. ThiF family 

This family contains a repeated domain in ubiqultih activating enzyme El and members of the bacterial 
ThiF/MoeB^esA family.Number of members: 87 
45 [1531] 649. Thbester dehydrase 

Members of this family are involved in fatty acid bk>synthesis. 
Number of members: 19 

II] 

Medline: 96398612 

so Structure of a dehydratase-isomerase from the bacterial pathway for biosynthesis of unsaturated fatty ackls: two cat- 
alytic activities in one active site. 
Leesong M, Henderson BS. Gillig JR. Schwab JM, Smith JL; 
Stojcture 1996;4:253-264. 
Database Reference: SCOP; Imka; fa; (SCOP-USA) [CATH-PDBSUM] 
55 Database reference: PFAMB; PB058036; 
[1532] 650. Tub family signatures 

The nrxxjse tubby mutation is the cause of maturity^set obesity, insulin resistance and sensory deficits. This mutation 
maps to a gene, tub [1 .2]. which codes for a protein that belongs to a family which currently consists of the foltowing 
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to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid; one cytosolic form and a mitochondrial form. While all 
these enzymes have a common function, they are widely diverse in terms of subunit size and of quaternary structure. 
A few years ago it was found (2] that several aminoacyl-tRNA synthetases share a region of similarity in their N-terminal 
section, in particular the consensus tetrapeptide His-lle-Gly-His {'HIGH') is veiy well consented. The 'HlGH'region has 
been shown [3] to be part of the adenylate binding site. The •HIGH' signature has been found in the aminoacykRNA 
synthetases specific forarginine. cysteine, glutamic acid, glutamine. isoleucine. leucine, methionine, tyrosine, tryp- 
tophan, and valine. These aminoacyl-tRNA synthetases are referred to as class-1 synthetases [4.5.6] and seem to 
share the same tertiary structure based on a Rossmann fold. Consensus pattern; P-x(0,2)-(GSTANHDENQGAPKl-x- 
ILIVMFP]-1HT]-{LIVMYAC]-G-{HNTGI-{LIVMFYSTAGPC 

[ 1] Schimmel R Annu. Rev. Biochem. 56: 125-1 58(1 987).( 2] Webster T. Tsai H., Kula M.. Mackte G.A.. Schimmel P 

Science 226:1315-1317(1984).[ 3J Brick R. Bhat TN.. Blow D.l^. J. MoL Bfol, 208:83-98(1 988).( 4] Delame 1^.. Moras 

D. BioEssays 1 5:675-687(1 993).[ 5] Schimmel R Trends Biochem. Sci. 16: 1-3(1 991 ).( 6) Nagel G.M., Doolittle R.F. 

Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 

[1 548] 661 . (tRN A-synt 1 C) tRNA synthetases class I (E and Q) 

[1 549] Other tRNA synthetase sub-families are too dissimilar to be included. 

This family includes only glutamyl and glutaminyl tRNA synthetases. 

In some organisms, a single glutamyl-tRNA synthetase aminoacylates both tRNA(Glu) and tRNA(GIn). 
[1550] [1] Rath VL. Silvian LF, Beijer B. Sproat BS. Steitz TA; Structure 1998;6:439-449. 
[1551] 662. (tRNA-synt Id) tRNA synthetases class 1 (R) 
[1552] Other tRNA synthetase sub-families are too dissimilar to be included. 
This family includes only arginyl tRNA synthetase, 
- [1553] 663. Aminoacyl-transfer RNA synthetases class-ll signatures (tRNA synt 2) 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form. While all 
these enzymes have a common function, they are WKlety diverse interms of subunrt size and of quaternary structure. 
The synthetases specific for alanine, asparagine, aspartfcacid, glycine, histidine. lysine, phenylalanine, proline, serine! 
and threonbie are referred to as class-ll synthetases [2 to 6] and probably have a common folding pattern in their 
catalytic domain for the binding of ATP and amino acid which is different to the Rossmann fold observed for the class 
I synthetases [7].CIass-ll tRNA synthetases do not share a high degree of similarity, however at least three consented 
regions are present (2.5,8]. Signature patterns have been derived from two of these regions. 
Consensus pattern: [F YH]-R-x-[DE J-x(4, 1 2)-{RHl-x(3)-F-x(3)4DE 

Consensus pattern: [GSTALVFh{DENQHRKPHGSTA]-(UVIVIF]-[DE]-R^U\^F]-x-[UVI^STAG]-tLIVM 
[ 1] Schimmel R Annu. Rev. Bkxhem, 56:1 25-1 58(1 987).[ 2] Delarue M.. Moras D. BioEssays 1 5:675-687(1 993).( 3] 
Schimmel R Trends Biochem. Sci. 1 6:1-3(1 991 ).[ 4] Nagel G.f^.. Doolittle R.R Proc. Natl. Acad. Sci. U.S.A. 88: 
8121-8125(1991). ( 5] Cusack S., Haertlein M.. Lebennan R Nuclei Ackls Res. 1 9:3489-3498(1 991 ).( 6] Cusack S. 
Biochimie 75; 1077-1081(1993).! 7] Cusack S., Berthet-Colominas C. Haertlein M.. Nassar N.. Leberman R. Nature 
347:249-255(1 990).( 8] Leveque R. Plateau R. Dessen R, Blanquet S. Nucleic Acids Res. 18:305-312(1990), 
[1554] 664. Aminoacyl-transfer RNA synthetases class^ signature (tRNA synt 1e) 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino ackjs and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each different amino ackJ, In eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form. WhHe all 
these enzymes have a common functkm. they are widely diverse in terms of subunit size and of quaternary stnjcture. 
A few years ago it was found [2] that several aminoacyMRNA synthetases share a region of similarity in their N-terminal 
sectkxi, in particular the consensus tetrapeptide His-lle-Gly-His f HIGH') is very well conserved. The 'HIG H' regton has 
been shown [3] to be part of the adenylate binding site. The 'HIGH' signature has been found in the aminoacyl-tRNA 
synthetases specific forarginine. cysteine, glutamic acid, glutamine. isoleucine, leucine, methkxiine. tyrosine, tryp- 
tophan, and valine. These aminoacyl-tRNA synthetases are referred to as class-l synthetases {4,5,6] and seem to 
share the same tertiary structure based on a Rossmann fold. 

Consensus pattern: P-x(0.2)-(GSTANHDENQGAPK]-x-(LIVf^FP].[HT]-[LlVf^YACl-G-{HNTG]-{LIVMFYSTAGPC 
1 1] Schimmel R Annu. Rev. Biochem. 56: 125-1 58(1 987),[ 2] Webster T. Tsai H„ Kula M.. Mackie G.A.. Schimmel R 
Science 226:1316-1317(1984).[ 3] Brick R. Bhat TN.. Blow DM. J. Mol. Bk)l. 208:83-98(1988),! 4] I^elanje M., Moras 
D. BioEssays 1 5:675^87(1 993). ( 5] Schimmel R Trends Biochem. Sci, 16:1 -3(1 991 ).[ 6] Nagel G.M.. Doolittle RR 
Proc, Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 
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l^^J^iTj!^ ^'r """^ '""'"^ P^^^' ""^'^ ^ s«r«ted muscle. 

teopomyos«i mediate the interactwns between the troponin complex and actin so as to regulate muscle contraction 

that fom« a coiled<o.l dimer. Muscle isofomis of tropomyosin are characterized by having 284 amino acid residues 

«, their N.tem,«al reg^n. The sgnature pattern for tropomyosins is based on a ve^r consented region in the 5eZ2 
section oJ tropomyosins and which is present in both musde and non-muscle forms ^^erm»,al 
Consensus pattem: L-K-E-A-E-x-R-A-E 

[I'siris/V^n^'*^*'^'" ^' 4:151-155(1979).( 2] McLeod A.R BioEssays 6:208-212(1986). 
SSiL'TriSr^nVsr «■ '^"^ '^''''^ •'^^^-^ 

Medline: 87144593 

Structure of co^jiystals of tropomyosin and troponin. 
White SP, Cohen C, Phillips GN Jr; 
Nature 1987;32S:826-e28. [2] 
Medline: 9515531S 

A direct regulatory role for troponin T and a dual role tor 
troponin 0 in the Ca2+ regulation of muscle contraction. 
Potter JD, Sheng Z. Pan 8S, Zhao J; 

J Biol Chem 1995;270:2557-2562. 

[3]Medline: 95324796 

The troponin complex and regulation of muscle contraction 
Farah CS, Reinach FC; 

FASEB J 1995;9:755-767. 
[1544] 658. (Tiyp mucin) Mucin-like glycoprotein 

T^ife family of trypanosomal proteins resemble vertebrate mucins. The protein consists of three reoions The 
N OKI C temi«iH are oonsenred between all members of the family, whereas the centraH^Sn fe ,»t wJi^^ 
and contains a large number of threonine residues which can be glycosylated 111 ^ ^ 

Indirect evidence suggested that these genes might encode the core protein of parasite mucins olvcooroteins th,f 
were proposed to be involved in the interaction with, and invasion of, n«mmalian3o^l^^ glycoproteins that 

[1] Di Noia JM. Sanchez DO. Frasch AC; J Biol Chem 1995:270 24146-24149 

f2] Di Noia JM. D-Orso I. Aslund L. Sanchez f30. Frasch AC; J Biol Chem 1998;273:10843-10850. 

[1S46I 659. Aminoacyl-transfer RNA synthetases class^ signature (tRNA synt 1 ) 

T^'S!:!^ synthetases (EC 6.1 . 1 .-) (i J are a group of enzymes which active amino acids and transfer them 

r^"^ biosynthesis. In prokaryotic organisms ^^a^TtZZ 

drfferent types of am^wacyMRNA synthetases, one lor each differentamino acid, ireukaryotrthere are 2S 

s::::s^erorrfrnr^tr"^ 

section, in particular the consensus tetrapeptide HisTle-Gly-H^ ('HIGHjllriT^r^SlS •hX^'^'T' 
been Shown [3] to be part of the adenylate binding site. iL -HIGH' sii^Zl^l^o'Sile am^^^ 
synthetases specific for aiginine. cysteine, glutamic add. glutamine. leucine. leucrnrr^et^^<^L tZSTf^^ 
topha,, and valine. These aminoacyl-tRNA synthetases are referred to as clas^-l synt^ete^s M 5 61 Jl^S." 

I1]SchimmelRAnnu. Rev. Biochem. 56:125-158(1987) [21 Webster Twin i^i.io m i,- ^ ^ 

u. Dotssays 15.675-687(1993).! SJSchimmel P. Trends Biochem Sd 161 -3/iqqnrRiM=.~.i r-ij o 

Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 3(1991)1 6] Nagel G.M.. DoolitUe R.F. 

[1547] 660. Aminoacyl-transfer RNA synthetases class-l signature (tRNA synt lb) 

Am«oacyf-tRNA synthetases (EC 6. 1 . 1 .-) (1] are a group of enzymes which actJe amino adds and transfer Ihem 
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Schulz H„ Elzinga M. J. Biol. Chem. 265:10424-1 0429(1 990).[ 3] Igual J.C.. Gonzalez-Bosch C. Dopazo J Perez- 
Ortin J.E. J. Mol. Evol. 35: 147-1 55(1 992).[ 4) Baker M.E.. Billheimer J.T.. Strauss J.F. Ill DNA Cell Biol 10 695-698 
(1991). 

[1558] 668. Thioredoxin family active site 

Thioredoxins [1 to 4) are small proteins of approximately one hundred amino-acid residues which participate in various 
redox reactions via the reversible oxidation of an active center disulfide tjond. They exist in either a reduced form or 
an oxidized foim where the two cysteine residues are linked in an intramolecular disulfide bond. Thioredoxin is present 
in prokaryotes and eukaryotes and the sequence around the redox-active disulfide bond is wellconserved. Bacteri- 
ophage T4 also encodes for a thioredoxin but its primary structure is not homologous to bacterial, plant and vertebrate 
thioredoxins. A number of eukaryotic proteins contain domains evolutionary related tothioredoxin, all of them seem to 
be protein disulphide isomerases (PDI). PDI(EC 5.3.4. H [S.S.?] is an endoplasmk: reticulum enzyme that catalyzes 
the rearrangement of disulfide bonds in various proteins. The various fomis of PDI which are currently known are: - 
PDI major isozyme; a multifunctional protein that also function as the beta subunit of prolyl 4-hydroxylase (EC 
^^±rU2). as a component of oligosaccharyl transferase (EC 2A1J19), as thyroxine deiodinase (EC 3,8. 1 .4). as 
glutathione-insulin transhydrogenase (EC 1.8.4.2 ) and as a thyroid hormone -binding protein I - ERp60 (ER-60; 58 Kd 
microsomal protein). ERp60 was originally thought to be a phosphoinositide-specific phospholipase C isozyme and 
later to be a protease. - ERp72. - P5.AII PDI contains two or three (ERp72) copies of the thioredoxin domain. Bacterial 
proteins that act as thiolidisulfide interchange proteins thatalkjws disulfide bond formatk)n in some periplasmic proteins 
also contain a thioredoxin domain. These proteins are: - Escherrchia coli dsbA (or prf A) and its orthologs in Vibrio 
cholerae (tcpG) and Haemophilus influenzae (por). - Escherichia coli dsbC (or xpRA) and its orthologs in Enwinia 
chrysanthemi and Haemophilus influenzae. - Escherichia coli dsbD (or dipZ) and its Haemophilus influenzae orthotog. 
- Escherichia coli dsbE (or ccmG) and orthologs in Haemophilus influenzae. Rhodobacter capsulatus (helX). Rhizio- 
biacae (cycY and tlpA), Consensus pattern: [LI VMF]-[LI VMSTA]-x-[UVMFYCHFYWSTHE]-x(2)- [FYWGTNl-C- (GAT- 
PLVE]-[PHYWSTA]-C-x(6)-[LI VMFYWT] [The two C's form the redox-active bond] 

( 1] Holmgren A. Annu. Rev. Bk)chem. 54:237-271 (1985).[ 2] Gleason FK.. Holmgren A. FEf^S Microbiol. Rev 54 
271-297(1988).(3] Holmgren A. J. Biol. Chem. 264:13963-13966(1989).! 4] Ekiund H., Gleason F.K.. Holmgren A 
Proteins 1 1 :1 3-28(1 991 ).( 5] Freedman R.B.. Hawkins H.C.. Murant S.J,. Reid L Biochem. Soc. Trans. 16:96-99(1 988) 
[ 6] Kivirikko K.I., IVlyllyla R„ Pihiajaniemi T FASEB J. 3:1609-161 7(1 989).( 7J Freedman RB.. Hirst T.R., Turte M R 
Trends Biochem, Sci. 19:331-336(1994). 

[15S9] 669. (Transcript fac2) Transcriptfon factor TFHB repeat signature 

In eukaryotes the initiatfon of transcriptkxi of protein encoding genes by polymerase II is modulated by general and 
specific transcription factors. The general transcription factors operate through common promoters elements (such as 
the TATA box). At least seven different proteins associates to form the general transcription factors: TFIIA, -IIB, -IID, - 
HE. -IIF, -IIG. and -IIH[1].Transcription factor IIB (TFIIB) plays a central role in the transcriptkxi of class II genes. tt 
associates with a complex of TFIID-IIA bound to DNA (DA complex) to fomi a temary complex TFIID-IIA-IBB (DAB 
complex) which is then recognized by RNA polymerase II [2.3]. TFIIB is a protein of about 315 to 340amino acid 
residues which contains, in its C-temiinal part an imperfect repeat of a domain of about 75 residues. This repeat could 
contribute an element of symmetry to the folded protein. The folbwing proteins have been shown to be evolutionary 
related to TFIIB: - An archaebacterial TFIIB homoiog. In Pyrococcus woesei a previously undetected open reading 
frame has been shown [4] to be highly related to TFIIB. - Fungal transcription factor IIIB 70 Kd subunit (gene 
PCF4/TDS4/BRF1 ) (5). This protein isa general activator of RN Apolymerase III transcriptk>nand plays a role anatogous 
to that of TFIIB in pol 111 transcriptfon. The central section of the repeated domain, which is the most consen/ed part of 
that domain has been selected as a signature p>attem. 

Consensus pattern: G-IKR]-x(3). ISTAGN]-x-[UVMYA]-(GSTA](2)-(CSAV]-[LIVMHUVMFY]-[UViy^Al-[GSA]-lSTAC 
[ 1] Weinmann R, Gene Expr, 2:81-91(1992).( 2] Hawley D. Trends Biochem. Sci. 16:31 7-318(1 991 ).[ 3] Ha L. Lane 
W.S.. Reinberg D. Nature 352:689-695(1 991 ).( 4] Ouzounis C. Sander C. Cell 71:1 89-1 90(1 992Vr 5] Khoo B Brophy 
B.. Jackson S.P. Genes Dev. 8:2879-2890(1 994). 
[1560] 670. (transcritp fact) MADS-box domain signature and profile 

A number of transcription factors contain a consen/ed domain of 56 amin<«icid residues, sometimes known as the 
I^ADS-box domain [El]. They are listed below - Serum response factor (SRF) {11. a mammalian transcription factor 
that binds to the Serum Response Element (SRE). This is a short sequence of dyad symmetry kxated 300 bp to the 
5'end of the transcriptton initiatksn site of genes such as c-fos. - Mammalian myocyte-specific enhancer factors 2A to 
2D (MEF2A to MEF2D), These proteins are transcription factor which binds specifically to the I^EF2 element present 
in the regulatory regions of many muscle-specific genes. • Drosophila myocyte-specific enhancer factor 2 (f^EF2). - 
Yeast GRfyM^RTF protein (gene MCM1) [2]. a transcriptional regulator of mating-type-specific genes. - Yeast arginine 
metabolism regulation protein I (gene ARGRI or ARG80), - Yeast transcription factor RLMl . - Yeast transcriptran factor 
SMP1, - Arabidopsis thaliana agamous protein (AG) [3). a probable transcr^tion factor involved in regulating genes 
that determines stamen and carpel development in wi W-type flowers. Mutatkxis in the AG gene result in the replacement 
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(1 555] 665. AminoacyMransfer RNA synthetases class-ll signatures (tRNA synt 2b) 

^^"^T"^ ® ^'^ ^ °' ^"^"^ "^"^ and transfer them 

to specrfic tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least tvventy 
different types of aminoacyMRNA synthetases, one for each different amino acid. In eukaryoles there are generally 
Uvo aminoacyURNA synthetases for each different amino acid: one cytosolic fom, and a mitochondrial form While all 
mese enzymes have a common function, they are widely diverse interms of subunit size and of quaternary structure 
The ^thetasesspeciflcfor alanine. asparaglne.asparticacid. glycine. histidine. lysine. phenylalanine. proline serine 
f k!I'^ ^? T '° «y"«^««3^ [2 to 6] and probably have a common folding pattem in their 

catalytic domain for the binding of ATP and amino acid which is different to the Rossmann fold observed for the dass 
I synthetases pi Class^l tRNA synthetases do not share a high degree of similarity, however at least three consenred 
regions are present [2.5.8). Signature patterns have been derived from two of these regions 
Consensus pattem: tFYHJ-fl-x-{DE]-x(4,12HRHhx(3)-F-x{3H0E 

Consensus pattem: lGSTALVFh{DENQHRKPHGSTAHLIVMn-{DE]fl4UVMF]-x^LIVMSTAGl-IUVMFYl 
{ 1J Schimrnel P. Amu^Rev. Biochem. 56:125-158(1987).( 2] Delarue M.. Moras D. BioEssays 15:675-687(1993) f 3] 
Schimmel R Trends Biochem. Sci. 16:1 -3(1991).! 4J Nagel G.M.. Doolittle R.R Proc. Natl. Acad Sd USA-V 
8121-812S(1991M 5] Cusack S.. Haertlein M.. Lebeman R Nucleic Acids Res. 1 9:3489-3498(1991 ).f 61 Cusack S 
^ZViil^^Z ? ^ ^'^^'^ C.. Haertlein M.. Nassar N.^ebeli R. Natufe 

^'-'^^ "-^^"^ ^^^"^"^ ^^^^ S. Nudeic Adds Res. 18:305-312(1990) 

[1556] 666. Thaumatin family sigr»ture 

Thaumatin llj is an intensively sweet-tasting protein (100 000 times sweeter than sucrose on a mdar basis) from 
Thaumatococcus daniellii. an African brush. The protein is made of about 200 reskJues and contains 8 disulfkJe tjonds 
A number of proteins have been found to be related to thaumatins. These protein are listed below (references are only 
provided for recenthr detemiined sequences). - A maize a|pha.amylasertrypsin inhibitor. - Two tobacco pathogenesis- 
re^ed proteins: PR-R najor and minor forms, whch are induced after infection with vioises. - Salt-induced protein 
^1 ? ' Sf^""' ^ protein from tobacco. - Osmotin-Bke proteins OSML13. OSML15 and 

OSMLB fiom potato [21. - P21 . a leaf protein from soybean. - PWIR2. a leaf protein from wheal - Zeamatin. a maize 
antifunal protein (3].The exact btological functwn of all these proteins is not yet known. Aconserved region that indudes 

three cysteine residues known (in thaumatin) to be hvdved in disulfide bonds has been seleded as a signature pattem 
^ 1^,^ ^ II III r- 

CxxxCxIl nil . . , 

+ ++H. I ^ ^Xi\ consenrad cysteine involved in a disulfide bond.'": position of the oattem 

Consensus pattem: G-x-{GIT-x-C-x-T-IGA]-D-C-x(l .2)-G-x(2 3)-C on oi ine panem. 

! 2! B '"ch.^f T rM'~\'' p 'J^o^^^^^^^ "^^^ C.T Gene 18:1-12(1982). 

S?o.M:;pTn:C^i.ic2:u^^^^^^^^ 

[1557] 667. Thfolases signatures 

!fli?'?fj?'^^ ^™ ^ ^ eukaryotes and in prokaiyotes: acetoacetyKJoA thiolase (EC 
J! tholase(EC Zaije). 3-ketoacyl<;oA thiolase (also called thfolase I) has a broad dih- 

ength specificrty for its substrates and is involved in degradative pathways sudi as fatty add beta-oxidatton Ace- 
toacetyl-CoAthiolase(alsocalledthk^asell)isspecificforthethiolysisofa<4^ 

Z r "^^^^^^ '*^->^ steroid biogenesis. In eukaryotes. there Tre tw^rnSs S 
3-ketoacyl-CoA thiolase: one k)cated in the mitodwndrten and the other in pe«»dsomes. There are two consenred 

uT^^^^TT '"^ "^"^ ^ '^^ °» the enzymes is involved 

^^IT^^^ acyKenzyme intem,ediate; the second k«ated at the C-tem,inal extremity is the active site base 
Kivolved in depratonation in the condensation reactwn. Mammalian nonspecific Bpkl-transfer pratein (nsL-TP) (also 
known as sterol earner protein 2) is a protein whidi seems to exist in two different fomis: a 14 Kd protein ScM)^ 

the latter is found m peroxisomes. ^ 
The C-temiinal part of SCP-x is kJentk:al to SCP-2 whfle the N-temiinal port«n is evolutionary related to thk>la8esf41 
Three signature patterns have been devek^ped for this family of proteins, two of whk:h are based on the regfens a^n^ 
JotSS^ ^ is based on a highly oonsenred region in the C4em,inal S^crf 

Consensus pattern: fUVMJ-{NSTl-x(2)-C-ISAGUHSTHSAGJ-IUVMFYNShx4STAGHUVMl-x(6)-f UVMl [C is involved 
in formation of acyl-enzyme intemiediate] j-i"«ivijiv.i5invoivea 

Consensus pattem: N-x(2)-G.G-x-{UVMJ-[SA]-x-G-H-P-x-(GAl-x-[ST]-G 

1 1] Peoples O R. Sinskey A.J. J. Biol. Chem. 264:15293-15297(1989).! 2] Yang S.-Y. Yang X-YH.. HeaV-Louie 6.. 
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transmembrane anchor.TM2 to TM4: transmembrane regions 2 to 4.'C': conserved cysteine. : position of the pattern. 
A conserved region that includes two cysteines and seems to be located in a short cytoplasmic loop between two 
transmembrane domains has been selected as a signature for these proteins. 

Consensus pattern: G-x(3HLIVMFhx{2HGSAHLIVMF](2)-G-C-x-[GAl-[STAJ- x(2)-[EGJ-x(2).(CWNl-[UVMJ(2) 
( 1 J Levy S.. Nguyen V.Q.. Andria M L.. Takahashi S. J. Biol. Chem. 266:1 4597-1 4602(1 991 ).[ 2J Tomlinson M.G„ Wil- 
liams A.F.. Wright M.D. Eur. J. Immunol. 23:1 36-40(1 993).[ 3] Barclay A.N., Birkeiand M.L. Brown fAM„ Beyers A.O., 
Davis S.J.. SoTTWza C. Williams A.F. The leucocyte antigen factbooks. Academic Press. London / San Diego. (1993). 
[1563] 673. Tryptophan synthase alpha chain signature 

Tryptophan synthase catalyzes the last step in the bbsynthesis of tryptophan: the conversion of indoleglycerol phos- 
phate and serine, totryptophan and gtycerakJehyde 3-phosphate (1 .2). It has two functional domains: one for the aldol 
cleavage of indoleglycerol phosphate to indole andglycerakJehyde 3-phosphate and the other for the synthesis of 
tryptophan fromindole and serine. In bacteria and plants 13], each domain is found on a separate subunit (alpha and 
beta chains), while in fungi the two domains are fused together on a single multifunctional protein. A conserved region 
that contains three consented ackJic residues has been selected as a signature pauem for the alpha chain. The first 
and the third acidic residues are believed to sen^e as proton donors/acceptors in the enzyme's catalytic mechanism 
Consensus pattern: (LIVfy/1]-E-[LIVM]-G-x(2)-IFYCl-(ST)-(DEl-(PAl-fLI VMYJ- [AGLI1-[0E]-G 

( 1| Crawford I P. Annu. Rev Microbiol. 43:567-600(1989).! 2] Hyde C.C.. Miles E.W. BioHechnokjgy 8:27-32(1990) 
[ 3J Berlyn M,B.. Last R.L. Fink G.R. Proc. Natl. Acad. Sci. U.S.A. 86:4604-4608(1989). 
(1564] 674. Tryptophan synthase beta chain pyridoxal-phosphate attachment site 

Tryptophan synthase catalyzes the last step in the bkjsynthesis of tryptophan: the conversion of indoleglycerol phos- 
phate and serine, totryptophan and glyceraWehyde 3-phosphate [1 .2). It has two functional domains: one for the aldol 
cleavage of indoleglycerol phosphate to indole andglyceraldehyde 3-phosphate and the other for the synthesis of 
tryptophan fromindole and serine. In bacteria and plants (3], each domain is found on a separate subunit (alpha and 
beta chains), while in fungi the two domains arefused together on a single multifunctional protein. The beta chain of 
the enzyme requires pyridoxal-phosphate as a cofactor. The pyridoxal-phosphate group is attached to a lysine reskJue. 
The regksn around this lysine residue also contains two histidlne residues which are part of the pyridoxal-phosphate 
binding site. The signature pattern for the tiyptophansynthase beta chain is derived from that consented regwn. 

- Consensus pattern: [LI VMI-x-H-x-G-[STA]-H-K-x-N [K is the pyridoxal-P attachment site] 

[ 1] Crawford I.P Annu, Rev. Microbiol. 43:567-600(1 989).( 2] Hyde C.C.. Miles E.W. BioHechnology 8:27-32(1990). 
[ 3] Berlyn M.B.. Ust R.L, Fink G.R. Proc. Natl. Acad. Sci. U.S.A. 86:4604-4608(1989). 
[1565] 675. Serine proteases, trypsin family, active sites 

The catalytc activity of the serine proteases from the trypsin family is provided by a charge relay system involving an 
aspartk: acid residue hydrogen-bonded to a histidine. whrch itself is hydrogen^wnded to a serine. The sequences in 
the vksinity of the active site serine and histidine resWues are well conserved in this family of proteases [1]. A partial 
list of proteases known to belong to the trypsin family is shown betow. - Acrosin. - Blood coagulation factors VII. IX, X, 
XI and XII, thrombin, plasminogen, and protein C. - Cathepsin G. - Chymotrypsins. - Complement components Cir! 
CIS, C2, and complement factors B, 0 and I. - Complement-activating component of RA-reactive factor. - Cytotoxic 
cell proteases (granzymes A to H). - Duodenase I. - Elastases 1. 2. 3A. 3B (protease E). leukocyte (medullasin). - 
Enterokinase (EC 3.4.21.9) (enteropeptidase). - Hepatocyte growth factor activator. - Hepsia - Glandular (tissue) ka- 
llikreins (including EGF-binding protein types A. B. and C. NGF-gamma chain, gamma-renin. prostate specific antigen 
(PSA) and tonin). - Plasma kallikrein. - Mast cell proteases (MCP) 1 (chymase) to 8. - Myeloblastin (proteinase 3) 
(Wegener's autoantigen). - Plasminogen activators (urokinase-type. and tissue-type). - Trypsins I, II. III. and IV. - Tryp- 
tases. - Snake venom proteases such as ancrod. batroxobin. cerastobin. flavoxobin. and protein C activator. - Colla- 
genase from common cattle grub and collagenolytic protease from Atlantic sand fiddler crab. - Apolipoprotein(a)- - 
Bkxxj fluke cercarial protease. - Drosophila trypsin like proteases: alpha, easter. snake-locus. - Drosophila protease 
stubble (gene sb). - Major mite fecal allergen Der p III. All the above proteins belong to family SI in the classification 
of peptidases[2.El] and originate from eukaryotic species. It should be noted thatbacterial proteases that belong to 
family S2A are similar enough in the regions of the active site residues that they can be picked up by the same patterns. 
These proteases are listed below. - Achromobacter lyticus protease I. - Lysobacter alpha-lytk; protease. ■ Streptogrisin 
A and B (Streptomyces proteases A and B). - Streptomyces griseus glutamyl endopeptklase !1. - Streptomyces fradiae 
proteases 1 and 2. 

Consensus pattern: {LIVM]-{ST]-A-[STAG]-H-C (H is the active site residue] 

Consensus pattern: lDNSTAGC]-IGSTAPIMVQH]-x(2)-G.{DEl-S-G-(GS]-ISAPHV]-lLIVMFYWH]-lUVMFYSTANQH] 
(S is the active site residue] 

1 1] Brenner S. Nature 334:528-530(1 988). ( 2] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 244:1 9-61 {1994).(E1] 
[1566] 676. (tsp) Thrombospondin type 1 domain 
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of the stamens by petals and the carpels by a new flower. - Arabidopsis thaliana homeotic proteins Apetala! (API) 
Apetala3 (AP3) and Pistillata (PI) which act locally to specify the identity of the flora! meristem and to determine sepai 
and petal development [4]. - Antirrhinum majus and tobacco homeotic protein deficiens (DEFA) and giobosa (GLO) 
[5). Both proteins are transcription factors involved in the genetic control of flower development Mutations in DEFA or 
GLO cause the transfonnation of petals into sepals and of stamina into carpels. - Arabidopsis thaliana putative tran- 
scription factors AGL1 to AGL6 [6J, - Antirrhinum majus morphogenetic protein DEF H33 (squamosa) In SRF the 
consented domain has been shown [1] to be involved in DNA-binding and dimerizalion, A pattern that spans the 
plete length of the domain has been derived. The profile also spans the length of the MADS-box 

Consensus pattern: R-x-fRKl-x(5)-l.x4DNGSK]-x(3HKR]-x(2)-T4FY].x-IRK)(3). x(2)-njVMl.x.K(2).A-x.E4UVMl. 
(STAJ-x-L-x(4HLlVfy^-x-IUVf^(3)-x(6)-[LlVI^F]-x(2)-[FY] 

[ 1] Nomnan C. Runswick M.. Pollock R.. Trersman R Cell 55:989-1003(1988) f 2J Passmore S.. Maine G T. Elble R 
Chnst C. Tye B.-K. J. Mol. Biol, 204:593^(1 988).( 3] Yanofsky M.. Ma H., Bowman J., Drews G.. Feldn^n ICA ' 
Meyerowitz E.M. Nature 346:35-39(1990).! 4) Goto K.. Meyerowitz E.M. Genes Dev 8:1548-1560(1994) ( 5] Troebner 
W.. Ramirez L. Motte R. Hue I,. Huijser P. Loennig W.-E.. Saedler H.. Sommer H.. Schwartz-Sommer 2 EMBO J 
11:4693-4704(1992).[6] Ma K. Yanofsky M.F.. Meyerowitz E.M. Genes Dev 5:484-495(1 991 ).[Eil 
[1561] 671. Transketolase signatures 

Transketolase (EC (TK) catalyzes the reversible transfer of a twocarbon ketol unit from xylulose 5-phosphate 

to an aldose receptor, such as ribose 5i)hosphate. to fomi sedoheptuk)se 7-phosphate and glyceraldehyde 3-phos. 
phate. This enzyme, together with transaWoIase. provWes a link between the glycolytic and pentose^Jhosphate path- 
ways. TK requires thiamin pyrophosphate as a cofactor. In most sources where TK has been purified, it is a homodimer 
of approximately 70 Kd subunrts. TK sequences from a variety of eukaryotic and prokaryotic sources [1 .2] show that 
the enzyme has been evolutbnarily conserved. In the peroxisomes of methytotiophic yeast Hansenula'polymorpha 
there e a highly related enzyme, dihydroxy^cetone synthase (DHAS) (EC 2.2.1.3) (also known as formaldehyde tran- 
sketolase). whfch exhibits a very unusual specificity by including fomiaWehyde amongst its substrates 1-deoxyxy1u- 
lose-SiJhosphate synthase (DXP synthase) [3] is an enzyme so far found in bacteria (gene dxs) and plants (gene 
GUI) which catalyzes the thiamin pyr<^hosphoate<lependent acytoin condensation reaction between carbon atoms 
2 and 3 of pyruvate and glyceraldehyde 3-phosphate to yield 1-deoxy-D- xylutose-5-phosphate (dxp). a precursor in 
the bosynthetic pathway to isoprenoWs. thiamin (vitamin B1 ). and pyrkloxol (vitamin B6). DXP synthase is evolutkxiary 
related to TK. Two regions of TK have been selected as signature patterns. The first. k)cated in the N4erminal sectwn 
contains a histkline reskJue wh«h appears to functon inproton transfer during catalysis [4]. The second, kxated in the 
central section, contains consented acidic residues that are part <rf the active cleft and may participate in substrate- 
binding [4]. 

Consensus pattern: R-x(3).[UVMTAHDENQSTHKFhx(5.6)-{GSN]^^H-[PUVMFl-{GSTA]-x(2)-[UMCHGS 
Consensus pattem: G-[DECK3SA]-[DNJ-G-[PAEQ]-[STl.[HQ]-x-[PAGMJ.(LIVMYACHDEFYWhx{2)4STAPl-x(2HRGAl 
[ 1] Abedinia M.. Uyfield a. Jones S.M.. Nixon RF.. Mattwk J.S. Biochem. Btophys. Res, Commun. 183 H59-1166 
(1992).I2] Fletcher T.S.. Kwee I.L. Nakada T, Largman C. Martin B,M. Biochemistfy 31:1892-1896(1992) [3] 
Sprenger G.A.. Schorken U.. Wiegert T.. Grolle S„ De Graaf A.A.. Taylor S.V.. Begley TP.. Bringer-Meyer S.. Sahm 
H. Proc. Natl. Acad. Sci. U.S.A. 94:1 2857-1 2862M 997^ f4l Undqvist Y. Schneider G.. Ermler U.. Sundstroem M EMBO 
J. 11:2373-2379(1992). 

[1 562] 672. Transmembrane 4 family signature 

Recently a number of eukaryotic cell surface antigens have been found to be evolutionary related (1^ 3J The proteins 
knownto betongtothisfamily are feted betow: -Mamrnafian antigen CD9(MIC3); ^ 

™^t^^?"* ' Mammalian leukocyte antigen CD37. expressed on B lymphocytes. - Mammalian leukocyte antigen 
CD53 (OX-44). which may be involved ffi growth regulatkxi in hematopoietic ceUs. - Mammalian lysosomal membrane 
protein CD63 (melanoma-associated antigen ME491; antigen A01). - Mammalian antigen CD81 (cell surface protein 
TAPA-1). which may play an important role in the regulatbn of lymphoma cell growth, - Mammalian antigen CD82 
(protein R2; antigen C33; Kangai 1 (KAI1)). whk:h associates with CD4or CDS and delivers costimulatory signals for 
the TCR/CD3 pathway. - Mammalian antigen CD151 (SFA-1; platelet-endothelial tetraspan antigen 3 (PETA-3)) - 
Mammalian cell surface glycoprotein A15 (TALI_A-1; MXS1). - Mammalian novel antigen 2 (NAG-2) - Human tumor- 
associated antigen CO02Q. - Schistosoma mansoni and japonicum 23 Kd surface antigen (SM23 / S J23) These pro- 
teins share the following characteristcs: they all seem to be type 111 membrane proteins (type III proteins are integral 
membrane proteins that contain a N-temninal membrane-anchoring domain which b not cleaved during bk>synthesis 
and which functions both as a transkx:ation signal and as a membrane anchor); they also contain three additkxial 
transmembrane regions, at least seven consen/ed cysteines residues, and are of approximately the same size (218 
to 284 residues). These proteins are collectively know as the transmembrane 4 super family* CTM4) because they span 
the plasma membrane four times. A schematic diagram of the domain stmcture of these proteins isshown betow +- 

!r, ' ^ — " '^^^ * ' ™2I Cyt I TM3 I Extraceflular I TM4 I 

oyti H H--^ — c 1 CC C— C- -+~^— + — Cyt cytoplasmk: domain. TMa: 
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the active site residue 

[ 1] Jentsch S,. Seulert W.. Mauser H.-P. Biochim. Biophys. Acta 1089:127-1 39(1991 ),[ 2] D'andrea A.. Pellman D Cht 
Rev. Biochem. MoL Biol, 33:337-352(1 998).( 3] Johnston S,C.. Larsen C.N,. Cook W. J.. Wilkinson K.D.. Hill C P. EMBO 
J. 16:3787-3796(1 997).( 4) Rawlings N O.. Barrett AJ. Meth. Enzymol. 244:461-486(1994). 
[1577] 682. Ubiquitin carboxyl-terminal hydrolases family 2 signatures (UCH-1) 

Ubiquitin carboxyl-terminal hydrolases (UCH) (deubiquitinating enzymes) [1.2] are thk)l proteases that recognize and 
hydrolyze the peptide bond at the C-terminal glycine of ubiquitin. These enzymes are involved in the processing of 
poty-ubk^uitin precursors as well as that of ubiqutnated proteins. There are two distinct families of UCH. The second 
class consist of largeproteins (800 to 2000 residues) and is currently represented by - Yeast UBP1 UBP2 UBP3 
UBP4(orDOA4/SSV7). UBP5. UBP7. UBP9. UBPIO. UBP11. UBP12. UBP13, UBP14. UBP15 and UBP16. -Human 
tre-2, - Human isopeptidase T. - Human isopeptidase T-3. - Mammalian Ode-1. - Mammalian Unp. - Mouse Dub-1. - 
Drosophila fat facets protein (gene faf). - Mammalian faf homolog. - Drosophila D-Ubp-64E. - Caenorhabditis elegans 
hypothetk:al protein R10E11 .3. - Caenorhabditis elegans hypothetical protein K02C4.3.These proteins only share two 
regrans of similarity. The first region containsa conserved cysteine which is probably implicated in the catalytic mech- 
anism. The second region contains two conserved histidines residues, one of whk:h is also probably implicated in the 
catalytic mechanism. Signature patterns for both conserved regions have been developed. 

Consensus pattern: G-(LI VMFYl.x(1 .3)-[AGCl-[NASM]-x-C.[FYW]-IUVMC].INSTHSACV].x-[U VMS]-Q (C is the puta- 
tive active site residue] 

Consensus pattern: Y-x-L.x-[SAGHUVMFTl-x(2)-H-x-G-x(4.5)-G-H-Y [The two H's are putative active site reskJuesJ 
( 11 Jentsch S.. Seufert W.. HauserH.-P Bk)chim. Biophys. Acta 1089: 127-1 39(1 991 ).( 2] D'andrea A.. Pellman D Crit 
Rev. Biochem. Mol. Biol, 33:337-352(1 998). [ 3] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 244:461^(1994). 
[1578] 683. Ubiquitin carboxyl-terminal hydrolases family 2 signatures (UCH-2) 

Ubiquitin carboxyl-terminal hydrolases (UCH) (deubk^uitinating enzymes) [1.2] are thtol proteases that recognize and 
hydrolyze the peptkJe bond at the C-terminal glycine of ubiquitin. These enzymes are involved in the processing of 
poly-ubquitin precursors as well as that of ubiquinated proteins. There are two distinct families of UCH, The second 
class consist of largeproteins (800 to 2000 resWues) and is currently represented by: - Yeast UBP1 UBP2 UBP3 
UBP4 (or DOA4/SSV7). UBP5, UBP7. UBP9. UBPIO. UBP11. UBP12. UBP13, UBP14. UBP15 andUBP16. - Human 
tre-2, - Human isopeptidase T - Human isopeptklase T-3. - Mammalian Ode-1. - Mammalian Unp. - Mouse Dub-1. - 
Drosophila fat facets protein (gene iai). - Mammalian faf homolog. - Drosophila D-Ubp-64E. - Caenorhabditis elegans 
hypothetk:al protein R10E11.3. - Caenorhabditis elegans hypothetk^l protevi K02C4.3.These proteins only share two 
regfons of similarity. The first regkxi containsa conserved cysteine whfeh is probably Implcated in the catalytic mech- 
anism. The second region contains two conseived histWInes residues, one of which is also probably implrcated in the 
catalytic mechanism. Signature patterns for both consented regions have been devekDped. 

Consensus pattern: G-[LI VMFY]-x(1 .3)-[AGCl-[NASM]-x-C-[FYWHUVMCl-(NSTJ-[SACV]-x-[UVMS]-Q [C is the puta- 
tive active site residue] 

Consensus pattern: Y-x-L.x-[SAG]-(LI VMFn-x(2)-H-x-G.x(4,5)-G-H-Y [The two H's are putative active site resWues] 
( 1 J Jentsch S., Seufert W.. Hauser H.-P, Bkx;him. Biophys. Acta 1089:1 27-1 39(1 991 ).{ 2] D'andrea A.. Pellman D Crit 
Rev. Biochem. Mol. Bfol. 33:337-352(1 998). [ 3] Rawlings N.D.. Barrett A.J. Meth. Enzymol, 244:461-486(1994). 
[1579] 684. UDP-glycosyltransf erases signature 

UDP glycosyltransferases (UGT) are a superfamily of enzymes that catalyzes the additfon of the glycosyl group from 
a UTP-sugar to a small hydrophobic molecule. This family currently consist of: - Mammalian UDP^Iuooronosyl trans- 
ferases (UDPGT) [1,2]. A large family of membrane-bound mfcrosomal enzymes whfch catalyze the transfer of glu- 
curonic ackJ to a wkle variety of exogenous and endogenous lipophilk: substrates. These enzymes are of major im- 
portance in the detoxification and subsequent elimination of xenobiotics such as dojgs and carcinogens. - A large 
number of putative UDPGT from Caenorhabditis elegans. - Mammalian 2-hydroxyacylsphingosine 1-beta-galactosyl- 
transferase [3] (also known as UDP-galactose-ceramkJe galactosyltransferase). This enzyme catalyzes the transfer 
of galactose to ceramkie. a key enzymatic step in the bkisynthesis of galactocerebroskJes. whch are abundant sphin- 
golipkls of the myelin membrane of the central newous system and peripheral nervous system. - Plants flavonol 0(3)- 
glucosyltransferase. An enzyme [4] that catalyzes the transfer of glucose from UDP-glucose to a flavanol. This reaction 
is essential and one of the last steps in anthocyanin pigment biosynthesis. - Bacutovinjses ecdysteroid UDP-glucosyl- 
transferase (EC 2.4.1.-) [5] (egt). This enzyme catalyzes the transfer of glucose from UDP-glucose to ectysteroids 
which are insect molting honnones. The expression of egt in the insect host interferes with the normal insect devetop- 
ment by blocking the molting process. - Prokaryotic zeaxanthin glucosyl transferase (gene crtX). an enzyme involved 
in carotenoid brasynthesis and that catalyses the glycosylation reactkxi whch converts zeaxanthin to zeaxanthin-beta- 
diglucoside.-Streptomyces macrolide glycosyltransferases (6]. These enzymes specifically inactivates macrolide ani- 
tibiotics via 2'-0-glycosylation using UDP-glucose. These enzymes share a consen/ed domain of about 50 amino ackJ 
residues locatedin their C-tenninal section and from which a pattern has been extracted todetect them 
Consensus pattern: [FW]-x(2)<i-x(2)-ILIVMYAl.(LIMVl-x(4.6)-lLVGACl. [LVFYA]-[UVMF)-(STAGCM]-[HNQI- 
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[1 567] (1 ] Bork P; FEBS ten 1 993;327: 1 25-1 30. 

[1 S68] 677. Tubulin subunits alpha, beta, and gamma signature 

Tubulins (1 .2) the major constituent of microtubules are dimeric proteins which consist ot t«vo cioselv related subunHs 
(alpha and beta). Tubulin binds two molecules of GTP at two different sites tn a«i p» a.T T^'^related subunits 
GTP is hydrovzed during incorporation into the micltubule^rrrE sSlll^Lurot^ StT"' 'i^" h 

pattern was developed from this region. With the exception of the simple euka^otes most so^^s e«.r«r«?»^^ 
Of Closely relatedalphaandbetaisotypes. In most species thereisathi^mem^^ 

tubulm IS found at microtubule organizing centers (MTOC) such as the spindte piS^rj^^e sua " 
gesting that rt is involved in the minus^nd nucleation of microtubute assembly U] centrosome. sug- 

Consensus pattern: [SAG]-G-G-T-G-(SAJ-G 

II) Cleveland D.W.. Sullivan K.F. Annu. Rev. Biochem. 54:331 -365(1 985).[ 2] Joshi H C Cleveland D W Cell Mo.ii 
[1 569] Tubulin-beta mRNA autoregulation signal 

The stability of beta-tubulin mRNAs are autoregulated by their own t<anslation product fl J Unpolymerized tubulin sub 
units bind directy (or activate a factor(s) which binds co4,anslationally) to the nascent SLZ^^^^a^T 
b|ndi^,^uc«d.hrcK.gh*^ 

The recognrtKDn element has been shown to be the firet four amino acids of beta^ubulin MeuS^e M^Vi^ 
to this sequence abolish the autoregulation effect (except for the replacement of Glu by Atn^tti^J^^^ 
sequence to an internal region of a polypeptide also suppresses the autoregutaton, effect °' 
Consensus pattern: <M-R-{DEJ-{IL] 
[ 11 Cleveland O.W. Trends Biochem. Sci. 13:339-343(1988) 

f^^,,i*"^'^-'y"•2c) '\^'™«^'^'^fefR^WV synthetase signatures. AminoacyMRNA synthetases 

£ fir^^ii'^ are a group of enzymes which activate amino acids and transfer them to specif^tRNA r^eculJal 
,RM?!L!f " " Prokaryotic organisms there are at least twenty different hmToTSZvT 

tRNA synfrietases, one for each different amino acid. In eukaryotes there are generally Wto^S^SiSaZZSI 

funct«n. they are WKlely diverse in temis of subunit size and of quatemaor stnicture The 3^21 soST^ 

toasclass^l synthetases (2 to 6] and probity have a common folding pattem in their catahrtic domain lor ih« hJko^ 
of ATP and aminoacid which is dWerent to the Rossmann fold obser;j; for Jcbi uSt2XcS™ 
^thetases do not share a high degree of shnilarity. however at least three conse^eSTegS a«^«^, 
Signature patterns have been derived from two of these regions ' " ' ' 

Consensus pattem: [FYH]-R-x-[DE]-x(4.12HRH]-x(3)-F-x{3)-fOEh 

S f il iS^m^P T ^-.^ *^^25-158(1987).(2] Delarue M.. Moias D. ^i^Essa^l 5 675-687 

JSa 2 SsMpt^ 7^ B«*ern. Sci. 16: 1-3(1 991 ).f 4] Nagel G.M.. Doolittle R.F. Proc. Natl. La £^ 
A. 88.8121^125(1991). [51 Cusack 8.. Haertlein M.. Lebemian R. Nucleic Acids Res 19-3489-349af199m fiin^^^l^l 

S*!^lc'\' ^'"^"'^ o' ^ DNA repair protein USA domain that inteiacts with HI\M Vpr Dieckmann T With«« 

Ward ES Jarosinski MA. Liu CF. Chen IS. Feigon J. Nat Struct Biol 1998;S: 1042-1 047 Withers- 
1 1575] 680. UBX domain 

f'J^^^rnR?^ ^'^^ ^'^^ Shplp Number of members 19 

Lre:??re^B"^rsT,9j^lT7?"^^ 

f 1S76] 681^ (UCH) Ubiquitin cartwxyl-temiinal hydrolases family 1 cysteine active site 

hSe" ^S^^^ «;eubiquitina«ng enzymes) ^^ .2] are th». proteases that recogn^e and 

r^TZT C-ternimal g^cine of ubiquitin. These enzymes are involved in the pro^sino of 

.rem the reg^n around that resk^ue. Consensus pattem: Q-x(3U4^?S::S^S'S^^Ia^^^^ 
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last potential transmembrane region has been selected as a signature pattern.. 

Consensus pattern: G-(STIFl-V-x(2)-[LIVM]-x(6)-[UVMFl-x(3HDQ]-x(3)-(LIV]- x-[UV]-P-N-x(2)-[LIVMFl-{LlVFSTA]-x 
(5)-N 

[ 1] Bairoch A. Unpublished observations (1997). 

[1586] 689. Uncharacterized protein family UPF0004 signature 

The following uncharacterized proteins have been shown (1| to share regions of similarities: - Escherichia coli hypo- 
thetical protein yliG. - Escherichia coli hypotheticai protein yleA and HI001 9. the con-esponding Haemophilus influenzae 
protein. - Bacillus subtilis hypothetical protein yqeV. - Helicobacter pylori hypothetical protein HP0269. - Helicobacter 
pylori hypothetical protein HP0285. - Mycoplasma iowae hypothetical protein in 16S RNA 5'region. - Mycobacterium 
leprae hypothetical protein B2235_C2_195. - Pseudomonas aeruginosa hypothetical protein in hemL 3'region. - Syn- 
echocystis strain PCC 6803 hypothetical protein slr00d2. - Synechocystis strain PCC 6603 hypothetical protein 
SI10996. - Methanococcus jannaschli hypothetical protein MJ0865. - Methanococcus jannaschii hypothetical protein 
MJ0867. - Caenorhabditis eiegans hypothetical protein F25B5.5.The size of these proteins range from 47 to 61 Kd. 
They contain six conserved cysteines, three of which are clustered in a region that can be used as asignature pattern. 
Consensus pattern: (LIVM]-x-[LIVMT]-x(2)-G-C-x(3)-C-ISTANJ'[FY]-C-x-(LlVM]-x(4)-G 
(1 j Bairoch A. Unpublished obsen^ations (1997). 
[1587] 690. Uncharacterized protein family UPF0005 signature 

The foliowing proteins seems to be evolutionary related [1]: - Mammalian protein TEGT (Testis Enhanced Gene Tran- 
script). - Escherichia coli hypothetical protein yccA and HI0044, the corresponding Haemophilus influenzae protein. - 
A probable Pseudomonas aeruginosa ortholog of yccA. These are proteins of about 25 Kd which seem to contain 
seven transmembranedomains. A signature pattern that corresponds to a region that starts with the beginning of the 
third transmembrane domain and ends in the middle of the fourth one has been developed. 

Consensus pattem: G-[UVM](2)-ISA]-x(5,8)-G-x(2)-[LIVM]-G-P-x-L-x(4)-[SAGJ-x(4.6)-[LIVM](2)-x(2)-A-x(3)-T-A- 
(UVM1{2)-F 

(1] Walter L, Marynen P., Szpirer J., Levan G., Guenther E. Genomics 28:301-304(1995). 
[1588] 691 . Uncharacterized protein family UPF0006 signatures 

The following uncharacterized proteins have been shown [1) to share regions of similarities: - Yeast chromosome II 
hypothetical protein YBL055c. - Escherichia coli hypothetical protein ycfH and HI0454, the corresponding Haemophilus 
influenzae protein. - Escherichia coli hypothetical protein yigW. - Escherichia coli hypothetical protein yjjV and HlOOei , 
the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yabD. - Haemophilus influ- 
enzae hypothetical protein H1 1664. - Mycoplasma genitalium hypothetical protein MG009. These are proteins of from 
24 to 47 Kd which contain a number of conserved regions. They can be picked up in the database by the following 
pattems. 

Consensus pattem: lLIVMFY](2)-D-ISTA]-H-x-H-[UVMF]-IDN 
Consensus pattem: P-[LIVMJ-x-[LIVMl-H-x-R-x-rTA]-x-IDE 

Consensus pattem: tLVSAl-[LlVA]-x(2)-(LIVM]-[PS]-x(3)-L-[LIVM]-[LIVMS]-E-T- D-x-P 
[ 1) Bairoch A., Rudd K.E. Unpublished observations (1995). 
[1589] 692. Uncharacterized protein family UPF0007 signature 

The following proteins seems to be evolutionary related [1]: - Escherichia coli hypothetical protein ygbP and H10672, 
the corresponding Haemophilus influenzae protein. « Bacillus subtilis hypothetical protein yacM. - Mycobacterium tu- 
berculosis hypothetical protein MtCY06G 11,29c. - Synechocystis strain PCC 6803 hypothetical protein slr0951. - A 
Rhodobacter capsulatus hypothetical protein m nil R3 5'reglon. Except for the Rhodobacter protein which contains a 
C-terminal extension, ail these proteins have from 225 to 236 amino acids. They are hydrophilic proteins that can be 
picked up in the database by the following pattern. 
Consensus pattern: V-L-pVJ-H-D-IGAJ-A-R 
[ 1] Bairoch A. Unpublished observations (1997). 
[1 590] 693. Uncharacterized protein family UPF001 5 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Yeast chromosome II 
hypothetical protein YBR002c. - Yeast chromosome XIII hypothetical protein YMRlOlc, - Escherichia coli hypothetical 
protein yaeU and HI0920. the conesponding Haemophilus influenzae protein. - Helicobacter pylori hypothetical protein 
HP1721. - Mycobacterium leprae hypothetical protein B1937_F2_65. - A Corynebacterium glutamicum hypothetical 
protein in aroF 3'regioa - A Streptomyces fradiae hypothetkal protein in transposon Tn4556, - Synechocystis strain 
PCC 6803 hypothetfcal protein sll0505. - Methanococcus jannaschii hypothetical protein MJ1 372.These are proteins 
of about 26 to 40 Kd whose central region is well consented. They can be picked up in the database by the following 
pattem. 

Consensus pattern: [DEHLlVMF)(3)-R-T-(SG]-G-x(2)-R-x-S-x-[FY]-(UVM](2)-W-Q- 

[ 1] Wolfe K.H., Lohan A.J.E. Yeast 10:S41-S46(1994). 

[1591] 694. Uncharacterized protein family UPF0016 signature 
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{1980).(2iBurchenB..NebertDW Nelson DR Bock K W lvan=.nT^. . " 

Acad. Sci. U.S.A 90:10265.10269(,993).C4,FurtekO.. Schiefett>etf J.W^jZ 

1 1 :473-481 (1988).1 5] O'Reilly O.R. MUler L K ««nnsion ^.. Nelson O.E. Jr. Plant Mol. Biol. 

nt^ nn^".'^^'^^®* '®' ^"^^ ^ - ^"^""^ • ■^^"'^^ C.. Salas J./^ Gene 134 139-140(1993) 
[1580] 685. UDP-gluco6e«3DP-niannose dehydrogenase family e i ^. i ja 1 40(1993). 

I1S81] The UDP-glucose/GDP-mannose dehydrogenaseses are a small group of enzvmes which «,» 

?2T/. W*k«oacB(, C«,1»M HE ^B?^a?2?^^ 

ME; J Biol Chem 1997;272:3416-3422. ^ * Tanner 

[1 583] 686. Uracil-DNA glycosylase signature 

w^iycosyiic bond. Uraal in DNA can anse as a result of misincoipoftation of dUMP residues hu nwA 
polymerase or deamration of cytosine. The sequence of uracil-DNA g^cosjtese Is e^remelv weB^^S^f^r^ 
ba«ena and eukaryotes as well as in herpes viruses. More distantly relied uracilSlSrSyT^esiri^found 
.n poxviruses t3].ln eukaryotic cells. UNG activity is found in both me nucleus and the mSS? Humt^i^^jG? 

Z^^t^T^^ localization [4], but the presence of a mitochondrial transitpeptide has not been SuS 
demonstrated. As a signature for this type of enzyme, the most N-temiina conserved reqicJ. has been sallied 2 

i^sr^nr^is^a™^ ^^-^^ 

Consensus pattern: tKRHLIVHUVCHUVM]-x-GHQIhW-Y {D is the active site residue!- 

I ij ssancar A., Sancar G.B. Annu. Rev. Biochem 57 P9-67/1 cmou oi nie^n t a ■ . 

373-487-493f199S^ f fil Mm r n M o. 5] Sawa R. McAuley-Hecht K.. Brown T, Pearl L. Nature 

sea. G Wu.er J.. Deriel J.K., Silver M.A Pr^t^^/iS^S'J.lTJ Sil^^^ 

donnas. J. BioL Chem. 268- 131 0-131 9/1 9991 nniRamoe n c 1 ;»-4»kit JV^.^^^ ' ' ' * ■ 

(1993). «»• '<»iy-i JiS(i993).[10]Bames D.E.. LndahlT, SedgwickB. Curr.Opin. Ceil Biol. 5.424-433 

I1S84] 687. Uncharacterized protein family UPF0001 signature 

The following uncharacterized proteins have been shown [1] to share regions ofsimilarities: 

* II hypothetical protein YBU)36c. - Caenorhabditis elegans hypothetical protein F09ES fl 

d^nonasaeruginosahypothe.^1 protein irpi^ 

gion. These are proteins of from 25 to 30 Kd which contain a number of o<xiservedW^TC!!L!lS» t 
regon which is located in the first thiol of these p«,teins has been sS^^aTa ?g;2'urpa^m 

Consensus pattern: IFWI-H-{FMHIVl-6-x-{UVl-Q-x-{NKRJ-K-x(3)-[UVl 
1 1 J Bairoch A.. Rudd K.E. Unpubh'shed obsenrations (1 996). 
{15851 688. Uncharacterized protein family UPF0003 signature 

I^I^-SSSSttttT'f ^ T'" "° ■ ^^^^ con protein 

aeiA Eschenchia coli hypothetical protein yggB. - Escherichia coli hypothetical protein vieP and Hl0iq«; 1 
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contain at least six or seven transmembrane regions. A conserved region located in the central section of these proteins 
has been developed as a signature pattern,. 

Consensus pattern: Y-x(2)-F-(LI VMAI(2)-x-L-x(4)-G-x(2)-F-(EQ]-[LI VMF]-P- (LI VM] • ( 1 ] Bairoch A.. Rudd K.E. Unpub- 
lished observations (1996). 

[1602] 702. Uncharacterized protein family UPF0034 signature 

The following uncharacterized proteins have been shown {1| to share regions of similarities: - Escherichia coli hypo- 
thetical protein yhdG and HI0979. the corresponding Haemophilus influenzae protein. - Escherichia coli hypothetical 
protein yjbN and HI0634, the corresponding Haemophilus influenzae protein. - Escherichia coli hypothetical protein 
yohl and HI0270, the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yacF. - 
Rhodobactercapsulatus protein nifRS and related proteins in Azospirillum brasilense and Rhizobium leguminosarum. 
- Synechocystis strain PCC 6803 hypothetical protein slr0644. - Synechocystis strain FCC 6803 hypothetical protein 
SI10926. - Caenortiabditis elegans hypothetical protein C45G9.2. - Yeast protein SMM1. - Yeast hypothetical protein 
YLR401C. - Yeast hypothetical protein YLR405W. - Yeast hypothetical protein YMLOBOw. Although it has been proposed 
[2] that Rhodobacter capsulatus nifR3 Is a transcriptional regulatory protein, it is believed that these proteins constitute 
a family of enzymes whose active site could include a conserved cysteine which has been used as the central part of 
a signature pattern. 

Consensus pattern: (LIVM]-(DNGl-{LIVM]-N-x-G-C-P-x(3)-(UVMASQJ-x(5)-G-[SACl 

[ 1J Bairoch A.. Rudd K.E. Unpublished observations (1995),l 2] Foster-Hartnett D., Cullen P.J., Gabberl K.K., Kranz 
R.G. Mot. Microbiol. 8:903-914(1993). 

[1603] 703. Uncharacterized protein family UPF0038 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli hypo- 
thetical protein yacE and HI0890, the corresponding Haemophilus influenzae protein. - Mycobacterium tuberculosis 
hypothetical protein MtCYOI B2.23 and O410, the corresponding HAycobacterium leprae protein. - Synechocystis strain 
PCC 6803 hypothetical protein slr0553. - Other hypothetical proteins from Aeromonas hydrophila, Bacteroides nodo- 
sus, Neisseria gonorrhoeae, Pseudomonas putida, Thermus thermophilus and Xanthomonas campestris. - Human 
hypothetical protein pOV-2. - Yeast hypothetical protein YDR196C. - Caenorhabditis elegans hypothetical protein 
T05G5.5.These proteins all contain, in their N-terminal extremity, an ATP/GTP-binding motif 'A' (P-loop) (see 
<PDOC00017>). The size of these proteins range from 200 to 290 residues (with the exception of the Mycobacterial 
sequences which are are 41 0 residues long). A conseved region some 50 residues away from the ATP-binding P-Ioop 
has been developed as a signature pattern. 

Consensus pattern: G-x.[Ul-x-R-x(2)-L-x(4).F-x(8)-[UVhx(5)-P-x-[UV]-[ 1] Rudd K.E.. Bairoch A. Unpublished obser- 
vations (1997). 

[1604] 704. Ubiquitin-conjugating enzymes active site 

Ubiquitin-conjugating enzymes (UBC or E2 enzymes) [1,2,3] catalyze the covalent attachment of ubiquitin to target 
proteins. An activatedubiquitin moiety is transfen^ed from an ubiquitln^ctivating enzyme (El) to E2which later ligates 
ubiquitin directly to substrate proteins with or without the assistance of 'N-end* recognizing proteins (E3), In most 
species there are nr»any fomis of UBC (at least 9 in yeast) which are implicated in diverse cellular functions. A cysteine 
residue is required for ubiquitin-thiolester formation. There is a single conserved cysteine in UBC's and the region 
around that residue isconserved in the sequence of known UBC isozymes. That region has been used as a signature 
pattern. 

Consensus pattern: IFYWLSP].H-[PC]-(NHHUV]-x(3.4)-G-x-(LIV]-C-(UV]-x- [LIV] [C is the active site residue] 
[ 1] Jentsch S.. Seufert W., Sommer T.. Reoris H.-A. Trends Biochem. Sci. 15: 195-1 98(1 990)1 2] Jentsch S.. Seufert 
W., HauserH.-R Biochim, Biophys. Acta 1089:1 27-139(1991 ).( 3] Hershko A. Trends Biochem. Sci. 16:265-268(1 991). 
[1605] 705. Uroporphyrinogen decarboxylase signatures 

Uroporphyrinogen decarboxylase (URO-D). the fifth enzyme of the heme biosynthetic pathway, catalyzes the sequential 
decarboxylation of the four acetyl side chains of uroporphyrinogen to yield coproporphyrtnogen [1 J.URO-D defrciency 
is responsible for the Human genetic diseases familialporphyria cutanea tarda (fPCT) and hepatoerythropoietk: por- 
phyria (HEP),The sequence of URO-D has been well conserved throughout evolution. The best conserved region is 
located in the N-terminal section; it contains a perfectlyconsen^ed hexapeptide. There are two arginine residues in this 
hexapeptide which could be involved in the binding, via salt bridges, to the cartxjxylgroups of the propionate skie chains 
of the substrate. This region has been used as a signature pattern. A second signature pattern is based on a another 
well conserved region which is kx:ated in the central section of the protein. 
Consensus pattern: P-x-W-x-M-R-Q-A-G-R 

Consensus pattern: G-F-[STAGCVl-(STAGC]-x-P-[FYW]-T-[LVl-x(2)-Y-x{2)-lAE]-[GK] 

[ 1] Garey J.R, Labbe-Bois R.. Chelstowska A.. Rytka J., Harrison L, Kushner J., Labbe R Eur. J Biochem 205 
1011-1016(1992). 

[1606] 706. ubiE/COQ5 methyltransf erase family signatures 

The following methyltransferases have been shown [I] to share regions of similarities: - Escherichia coll ubIE. whwh 
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YBR187W - FBsion yeast hypothetical protein SpACl7G8.08c. - Mouse protein pFT27 - Syriechccystis stra^ PCC 
6803 hypothet^al protein sll061 5. These are hydrophobic proteins ot 200 Jo 320 ac2£t^ZlZ^^ 

follow the second transmembrane domain has been selected as a signature pattern * oieciiy 

Consensus pattern: E-(UVMJ-G-D-K-T-F^UVMFJ(2)-A- 

1 1J Bairoch A. Unpublished observations (1996). 

[1592] 695. Uncharacterized protein family UPF0021 signature 

The followir.g uncharacterized proteins have been shown (1 ) to share regions ot similarities: - Yeast chromosome VII 
hypothetKa protem YGL511 w. - Dictyostelium discoideum protein vegl36. - Methanococcus janna^TSSZti^ 
proteins MJ1 1 57 and MJI 478.These are proteins of from 300 to 36o residues, "mey can be pic^uTh meSS 
' "^^""^ "^'^ " N^erminalsection. Consensus pattern: C-K.x(2)-F-x(S) E^SS 

1 1J Bairoch A. Unpublished observations (1997), 

[1 593] 696. Uncharacterized protein family UPF0023 signature 

JiLl^lT"^ uncharacterized proteins have been shown [1 J to share regions of similarities: - Mouse protein 22A3 • 
Yeast chromosome Xllhypotheticalpfotei^ 

anococcus ^maschn hypothetical protein MJ0592.These are hydrophilic proteins of about So Kd. They ^ be ptk^L 
up in the database by the following pattern. • t ney can oe picKeo 

[1594] Consensus pattern: D-x-D^-[UV]-L-x(4)-V-F-x(3)-S-K-G- 
[1595] [1] Bairoch A. Unpublished obsen^tions (1997) 

,,f,^7.^"*«^<=««rized protein family UPF0024 signature. The following uncharacterized proteins have been 
Sl^hHrSl'^ '^"^ 1 "'^t'^- ' ^'^^'^^ hypothetical protein ygbO and HI0701 the correspond ng 
^^^nv^o^^^^"^"! ' "«««*«««'Py'°ri hypothetical protein HP0926. - Yeast chromosome XV hypothet 
S 2^^- * J^"^^ «'«9^« hypothetical protein B0024. 11.- Methanococcus jannaschii h^e - 

SSTy rSrg p'Tttr -^^^^ "^^'^^^ °' - ^ "^'i^Je 

[1 597] Consensus pattern: G-x-K-D-[KR]-x-A-[LV]-T-x-Q-x-(U VFHSGCJ- 

[ 1] Bairoch A. Unpublished observations (1997). 

[1 598] 698. Uncharacterized protein family UPF0025 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities- - Escherichia coli hvoo- 

K^l protein MG207^ - Methanococcus jannaschii hypothetical proteins MJ0623 and 1^936. These are hydraphilic 

proteins of about 20 Kd, They can be picked up in thedatabase by the following pattern nyarophilic 

Consensus pattem: 0-V^LIVJ-x(2)-G-H-[ST>H-x(12)-[LIVMF]-N.p-G 

1 1] Bairoch A. Unpublished obsen/ations (1997). 

[1 599] 699. Uncharacterized protein family UPF0029 signature 

Z^ll^^^^ PJ0««ins have been shown (1] to share regbns of similarities: - Yeast chromosome III 
SSSS and HmrS^Sl- 'r^T hypothetical p«,tein YDL1 77C. - Escherichia coli hypothetical 

rS»l ; ^^^"^P°^<^9 Haemophilus influenzae protein. - Bacillus subtilis hypothetfcal protein 

(11KooninE.V.,Bori<P..SanderC.EMBOJ. 13:493-503(1994). *w luvwij-u 

(1600] 700. (Jncharacterized protein family UPF0030 signature 

l^i^'^i"rSr'^v'^?1'^'' shown [1 ) to be highly similar - Yeast chromosome VI hypothetical 

S ^ T^„' T^."^ »'yP°*eticalprotein YMR09Sc. - Yeast chromosome XIV hypothet^pS 

YNL334C. - Bactllus subtite hypothetical protein yaaE. - HaemophUus influenzae hypothetical proteh^ms^ Zh- 
anococcus jannaschii hypothetical protein MJ1661 .These are hydrophilic proteins TabouM 9 to Si Kd C 
picked up inthe database by the following pattern. <««"i i»io*»Ktt iheycanbe 

Consensus pattem: IGA]-L-I-{UV]-P-G-G-E-S-T-(STA] 
[ 1J Bairoch A. Unpublished obsenrations (1997). 
(1 601 ] 701 . Uncharacterized protein family UPF0032 signature 

The following uncharacterized proteins have been shown [Ij to share regions of similarities: - Escherichia coli hyoo- 
P-otein yigU and HI0188. the corresponding Haemophilus influenzae protein. - Baci«ust^buS^hyZS 
protei ycbT. - Mycobacterium tuberculosis hypothetical protein MtCY49.33c ^d U2126A. the c^rres^JS JiS 
bactenum leprae protein. - Synechocystis strain PCC 6803 hypothetical protein SII0194 Octo^tX^^Si?^ 
Phyra purpurea chlroplast hypothetical protein ycf4aThese proteins have from 245 to 3i?S?aSrrd^^; 
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the sequence. A profile was also developed that spans the complete length of the ubiquitin domain. 
Consensus pattern: K-x(2)4LIVMl-x-IDESAKl-x(3)-[UVM]^PA]-x(3)<>-x-[LIVM]^LI\^^ 

[ 1] JentschS.. Seufert W., HauserH.-R Biochim. Biophys. Acta 1089: 127-1 39(1 99 2] MoniaBR. Ecker DJ.. Crake 
ST. BioOechnology 8:209-21 5(1 990).[ 3] Finley D.. Varshavsky A. Trends Biochem. Sci. 10:343-347(1 985). [ 4] Filippi 
M.. Tribwii C. Toniolo D. Genomics 7:453-457(1 990).( 5] Oh/era J.. Wool I.G. J. Biol. Chem. 268:17967-17974(1993). 
[ 6] Kumar S.. Yoshida Y., Noda M. Biochem. Biophys. Res. Commun. 195:393-399(1993).! 7] Jones D.. Candido E. 
P. J. Biol. Chem. 268: 19545-1 9551 (1993).[8] Melnick L, Sherman F. J. Mol. Bbl. 233-372-388(1993) 
[1611] 710. VHS domain 

[1612] Domain present in VPS-27, Hrs and STAM. Number of members: 27 
[1613] 711. Vinculin family signatures 

Vinculin [1 J is a eukaryotic protein that seems to be involved in the attachment of the actin-based microfilaments to the 
plasma membrane. Vinculinis located at the cytoplasmic skie of focal contacts or adhesion plaques. In addition to actin. 
vinculin interacts with other stnjctural proteins such as talin and alpha-actinins. Vinculin is a large protein of 116 Kd 
(about a 1000 residues). Structurally the protein consists of an acidic N-lemiinal domain of about 90 Kd separated 
from a basic C-terminal domain of about 25 Kd by a proline-rich region of about 50 residues. The central part of the 
N-terminal domain consists of avariable number (3 in vertebrates. 2 in Caenorhabditis elegans) of repeats of a 110 
amino acids domain. Catenins [2] are proteins that associate with the cytoplasmic domain of avariety of cadherins. 
The association of catenins to cadherins produces a complex which is linked to the actin filament network, and which 
seems to be of primary importance for cadherins cell-adhesion properties. Three different types of catenins seem to 
exist alpha, beta, and gamma. Alpha-catenins are proteins of about 100 Kd which are evolutbnary related to vinculin. 
Intemfi of their structure the most significant differences are the absence, inalpha-catenin. of the repeated domain and 
of the proline-rich segment. Two signature pattems for this family of proteins have been devolped. The first pattern is 
located in the N-temiinal section of both vinculin and alpha-catenins and Is part, in vinculin, of a domain that seems to 
be involved with the interactran with talin. The second pattern is based on a consented regfonin the N-temiinal part of 
the repeated domain of vinculin. 

Consensus pattern: [KR]-x-[LIVf^F]-x(3)-(LIVMAl-x(2)-[LIVM]-x(6)-R-Q-Q-E-L Consensus pattern: fLIVfy/ll-x-fQAl-A-x 
(2)-W-IILl-x-[DN]-P 

[ 1 ] Otto J.J. Cell Motil. Cytoskeleton 16:1 -6(1 990).[ 2) Herrenknecht K.. Ozawa M.. Eckerskom C. Lottspeich F., Lenter 
M.. Kemler R. Proc. Natl. Acad, Sci. U.S.A. 88:9156-9160(1991), 
[1 61 4] 712. (Viteltogenin N) Lipoprotein amino terminal regkxi 

[1615] This family contains regions from: Viteltogenin. Mrcrosomal triglyceride transfer protein and apolipoprotein B- 
100. These proteins are all Involved in lipid transport (1). This family contains the LVin chain from lipovitellin. that 
contains two structural domains. Number of members: 33 

[1616] [1] The stmctural basis of lipid interactions in lipovitellin. a soluble lipoprotein. Anderson TA. Levitt DO. Ba- 
naszak LJ Structure 1998:6:895-909. 

[1617] 713. (VMS A) Major surface antigen from hepadnaviais 
[1618] 714. ssDNA binding protein (Viral DNA bp) 
This protein is found In herpesviruses and is needed for replication. 
[1619] 71 5. (Votage CLC) Voltage gated chloride channels 

[1620] This family of ton channels contains 10 or 12 transmembrane helices. Each protein forms a single pore. It 
has been shown that some members of this family form homodimers. These proteins contain two CBS domains. 

[1] Schmidt-Rose T, Jentsch TJ; J Biol Chem 1997:272:20515-20521 . 

[2] Zhang J. George AL Jr. Griggs RC, Fouad GT. Roberts J. Kwiecinski H. Connolly AM. Ptacek U Neurology 

1996;47:993-998. 

[1 621] 716. von Willebrand factor type A domain (vwa) 

More von Willebrand factor type A domains? Sequence similarities with malaria thrombospondin-related anonymous 

protein, dftiydropyrkline-sensitive cateium channel and inter-alpha-trypsin inhibitor. 

Boric P, Rohde K; 

Biochem J 1991:279:908-911. 



1. RUGGERI. Z.M. and WARE. J. 
von Willebrand factor. 
FASEB J. 7 308-316 (1993). 



2. COLOMBATTI. A.. BONALDO. P and DOUANA, R. 

Type A nruxiules: interacting domains found in several non4ibrillar collagens and in other extracellular matrix pro- 
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L^hl^iJT, o"'"?""^ and menaqumone biosynthesis and which cata^zes the S^Jenosylmethionine depend- 
em methylation of 2-poVprenyl-6-methoxy-l.4-benzoquinol into 2i30lyprenyl-3- methyl-6-methoxy-1.4-ben2oquinol 
and of demethylmenaquinol into menaquinol. - Yeast COQ5, a ubiquinone biosynthesis methlytransferase. - E^iilus 
subt. s spore gemir^atlon protein C2 (gene: gercB or gerC2). aprobable menaquinone biosynthesis methlytransferase 
- Uctococcus lactis gefC2 homolog. - Caenorhabditis elegans hypothetical protein ZK652.9. - Leishmania donovani 
arnastigote-specific protein Ml .These are hydrophllic proteins of about 30 Kd (except for 2K652 9 which is 6SKd» 
They can be picked up ni the database by the following patterns. 
Consensus pattern: Y-D-x-H^-N-x(2)-{IJVH/IJ-S-x(3)-H-x(2)-W 
Consensus pattern: R-V-fUVM]-K4PV)-G-G-x-{UVMF]-x(2HUVfWIJ-E-x-S 
( 1] Lee P.T., Hsu A.Y., Ha H.T.. Clarice C.F. J. Bacteriol. 179:1748-1754(1997) 
[1607] 707. Uricase signature 

Uricase (urate oxidase) [1] is the peroxisomal enzyme responsible for the degradation of urate into allantoin Some 
spec.es. I'ke pnmates and birds, have lost the gene for uricase and are therefore unable to degradeurate Uricase is 
a protein of 300 to 400 amino acids. A highly consented region located in the central part of the sequence has been 
used as a signature pattern. ^ ^^"a^uwn 

Consensus pattern: (LV]-x-(LVl-[UVJ-K-tSTVHSTJ-x4SNJ-x-F-x(2)-{FYhx(4)- lFY]-x(2)-L-x(S)-R 
( 1] Motojima K.. Kanaya S., Goto S. J. Biol. Chem. 263:16677-16681(1988). 
{1 608] 708. Universal stress protebi family (Usp) 

[1609] By a Mde range of stress conditions members of the Usp family are predicted to be related to the MADS-box 
proteins transcript fact and bind to DNA (2]. Number of members: 39 

wii^r^'S" ^,1'^ °K "T "i!^"^^ ^^^^ P™'*"' Escherichia coli during growth arrest. NysUom T. 

NeidhardtFC; Mol Microbnl 1994; 11:537-544. 

[2] Sequence anahrsis of eukaryotic developmental proteins: ancient and novel domains. Mushegian AR. Koonin 
EV; Genetics 1996; 144:817-828. 

[1610] 709. Ubiquitin domain signature and profile 

owi!" '^f *^ ^ "'^^fi" ^ ^ residues, found in all eukaryotic ceNs and whose sequence is 

extrer^ly well conserved from protozoan to vertebrates. It plays a key rale in a variety of cellular processes^ such as 
ATP-dependent selective degradatnn of cellular proteins.maintenance of chromatin stmcture. regulation of oene ex- 
pression, stress response and ribosome biogenesis. In most species, there are many genes coding for ubiquMn Ho^- 

to tail repeats of ubx,urt.n. The number of repeats is variable (up to twelve in a Xenopus gene). l7the majority of 

^?pZT1^^^ « ^■•^""'"^ ^"^^^ (CE?). There are two types of 

Su ^Z^^ 1° '^J***^ P'**"^- "^^^ « globular protein, the last four C-tem,inal r^es 
;„ extending from the compact stmcture to fomi a laif. important for its f unctwn. The latter is mediated 

by the ooj«lent conjugat«n of ubiquitin to target proteins, by an isopeptide linkage between the CHemiinal glycine and 

r ^''T! '^"^"^ ^^9"* P™'**^- ^^'^ ^™ « °» which are ^uttonary 

Z^ JI^ 1^ ^ eukaryotfc counterparts. - Mammalian piotein GDX (41 GOX is com- 

p^ Of two domajns. a N-tem,inal ubiquitin-Iike domam of 74 residues and a C4em,inal domain of M resWues with 
some similarly with the thyrogtobulin homwnogenic site. - Mammalian protein FAU [5J. FAU is a fusion protein which 
«jnsist Of a N-tem,inal ubk,ultin-like protein of 74 residues fused to ribosomal protein S30. - Mou^ pr«S2. SeSS 
16] a ubqurtm-like protein ^81 residues. - Human protein BATS, a large fusion protein of 1132 residues that conS« 
a N-ternimal ubiquitin-like domain. - Caenorhabditis elegans protein ubM [7]. UbH is a fuskxi protein which consist 

ol,^^ ? TT^ ^ N-tennmal domain that seems to be distantly, yet signiffcantly. related to ubiquitin - Mammalian 

R>^23-relatedprote.nsRAD23Aandf1A023B.-MammalianBCL-2bindingathanogene-l^ 

^rStr^rv domain. - Human spliceosome associated pratein 114 (sJ?il4 

uh^!? 1? ^ ; u*"" ^ '"^^''^^ " ''"P««««°" ^ Which contains a N-tem,ina^ 

ubK,u.t.n^ike doma«i - Human protein CKAPl/TFCB, Schizosacchaiomyces pombe protein alpll and CaenoriSS 
elegans hypothetK^I protein F53F4.3. These proteins contain a N-tem,inal ubk,uitin domain and a cSZafSS 
mk r!l!L" Schizosaccharomyces pombe hypothetical protein SpAC26A3.16. This protein contains a N-tem,inaI 
n^.? c^"!' ^"^^ ' """^ "biquitin-like proteins SMT3A and SMT3B. - Human ubkiuitin-like 

protein SMT3C (ateo known as PICI; Ubh; Sum^l; Gmp-l or Sentrin). This protein is involved in targetingranSAPI 
S.^, 1^' Po;e complex protein ranBP2. - SMT3-like proteins in plants and Caenort«bditis ele^ns. To identify 
ubK,urt.n and related protems. a pattem has been devetoped based on conserved positkins in the^ral section^ 
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amino acid residues. Structurally G-beta consists of eight tandem repeats of about 40 residues, each containing a 
central Trp-Asp motif (this type of repeat is sometimes called a WD-40 repeat). Such a repetitive segment has been 
shown [El .2,3,4,5] to exist in a number of other proteins listed below: 

- Yeast STE4, a component of the pheromone response pathway. STE4 is a G-beta like protein that associates with 
GPA1 (G-alpha) and STE18 (G-gamma). 

- Yeast MSI 1 . a negative regulator of R AS-mediated cAMP synthesis. MS1 1 is most probably also a G^eta protein . 

- Human and chicken protein 12.3. The function of this protein is not known, but on the basis of its similarity to.G- 
beta proteins, ft may also function in signal transduction. 

Chlamydomonas reinhardtii gblp. This protein is most probably the homolog of vertebrate protein 12.3. 
Human US1 . a neuronal protein involved in type-1 lissencephaly [E2]. 

- Mammalian coatomer beta' subunit (beta*-COP), a component of a cytosolic protein complex that reversibly as- 
sociates with Golgi membranes to fomi vesicles that mediate bk)synthetic protein transport. 

Yeast CDG4, essential for initiation of DNA replication and separation of the spindle pole bodies to form the poles 
of the mitotic spindle. 

- Yeast GOC20. a protein required for two microtubule-dependent processes: nuclear movements prior to anaphase 
and chromosome separatk>n. 

- Yeast M AK1 1 . essential for cell growth and for the replication of M1 double-stranded RN A. 

- Yeast PRP4. a component of the U4/U6 small nuclear ribonucleoprotein with a probable role in mRNA splicing. 
Yeast PWP1 , a protein of unknown functkxi. 

Yeast SKI8, a protein essential for controlling the propagation of double-stranded RNA. 
• Yeast SOF1 , a protein required for ribosomal RNA processing which associates with U3 small nucleolar RNA. 

- Yeast TUP1 (also known as AER2 or SFL2 or CYC9). a protein which has been Implicated in dTMP uptake, cat- 
abolite repression, mating sterility, and many other phenotypes. 

Yeast YCR57C. an ORF of unknown function from chromosome 111. 
Yeast YCR72C. an ORF of unknown function from chromosome III. 

Slime mold coronin, an actin-blnding protein. 

Slime mold AAG3, a developmentally regulated protein of unknown function. 

- Drosophila protein Groucho (formerly known as E(spl); 'enhancer of split*), a protein involved in neurogenesis and 
that seems to interact with the Notch and Delta proteins. 

- Drosophila TAF-ll-80, a protein that is tightly associated with TFIID. 

[1626] The number of repeats in the above proteins varies between 5 (PRP4, TUPl , and Groucho) and 8 (G-beta, 
STE4, MSI1. AAC3, CDC4, PWP1, etc.). In G-beta and G-beta like proteins, the repeats span the entire length of the 
sequence, while in other proteins, they make up the N-terminal, the central or the C-tenminal section. 
[1627] A signature pattern can be developed from the central core of the domain (positions 9 to 23). 

- Consensus pattern: (UVMSTAC]-[UVMFYWSTAGG]-(UMSTAG]-(UVMSTAGGl-x(2)-[DN]- 
x(2)-ILIVMWSTAC]-x^LIVMFSTAG]-W-[DENl-[UVMFSTAGCNl 



[1]Gilman A.G. 

Annu. Rev. Biochem. 56:615-649(1987). 
[ 2] Duronio R.J., Gordon J.I.. Boguski M.S. 
Proteins 13:41-66(1992). 
1 31 van der Voom L. Ploegh H.L 
FEES Lett. 307:131 134(1992). 

( 4] Neer E.J., Schmidt C.J.. Nambudripad R, Smith TF. 
Nature 371:297-300(1994). 
[ 5] Smith T.R. Gaiatzes C.G.. Saxena K., Neer E.J. 
Biochemistry In Press(1998). 



[1628] 718. WHEP-TRS domain containing proteins 

A consen/ed domain of 46 amino ackjs has been shown [1 ) to exist in a number of higher eukaryote aminoacyl-transf er 
RNA synthetases. This domain is present one to six times in the folbwing enzymes: 
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teins. 

MATRIX 13 297-306 (1993). 

a PERKINS. S. J.. SMITH. K.F.. WILUAMS. S C.. HARIS. Rl.. CHAPMAN. D. and SIM R 8 
t«Srs^^^^ ""'^''^ type A doma^ ^ taCoc B o, .un«n co.p.e.ent by Fourier 

dte^s.''""'^ "^'^ ^^^"^ «-^^^9ed structure pre- 

J.MOLBIOL 238 104-119 (1994). 

4. BORK, R and ROHDE. K, 

More von Willebrand factor type A domains? Sequence similarities with malaria thrombospondin-related anonv 

5. EDWARDS. YJ.K. and PERKINS. S J 
FEBS LETT. 358 283-286 (1995). 

6. LEE. J.O., RIEU, P., ARNAOUT. M.A. and UDDINGTON. R. 

CEa^M 63i%3a°(i9^ ^ subunit of integrin CR3 (CD11h/CD18). 

7. QU, A. and LEAHY, D.J. 

Crystal structure of the l-<toniain from the C0l1a/CD18 (LFA-1. alpfia L beta 2) intearin 
PROC.NATL.ACAD.SCI.USA92 10277-10281 (1995). 

(1^1 The von Willebrand factor is a large multimeric glycoprotein fourtd in blood plasma. Mutant forms are involved 

super^arnilyj-he vWF domain is found m various plasma proteins: complement factors B C2 CR3 and CR* me 
..J^nns (l-doma«,s); collagen types VI. VII. XII and XI V; and other extracellular proteins [2-47 fSSeSSiS^^ 
VWF domans participate in numerous biological events (e.g.. cell adhesion, migration. iooL lS^^r^Z^ 

alined vWF sequences has revealed a largefy alternating sequence of alpha-helices and beta-stranVs^FoTri 
Z „ H f L . ? '° compatibMity with a librajot known stoKturStTvJSfl JoSS f^ 

^11^ to be adoubly-wound. open, twisted beta-sheet flanked by alpha-helices [5J. 30 s^rucCUlZ bl^ 
detemnned for tt,e l-domains of integrins CD1 1 b (with 

surface. It has been suggested that this site represents a general metal ion-dependent adhSTS WloIsffS 
binding protein hgands (6J. The residues constituting the MIDAS motif in the CDIIb and CDm?^^ ^fciT 
pletej^ conserved, but the manner in which the metal ion is coordinated differs slightly [71 

[1623] yWFADOMAIN is a 3-element fingenprint that provides a signature for the vWF domain suoerfamilv Tt^ 

f^ues «volved « metal kx, coordination in l^ains (Asp and 2 serines in positions 8 10 anS 1 2 reJ^SSI? 
motrf2spansstrandsbeta-2andbeta-2^andmotif3encodesbeta.st,and3an^ 

coordinates the metal ion [6.7J. Three iteratons on OWL27.0 were required to reach conveSnceTSoJnS 
uue set ,»mpnsing 56 sequences was identified. Numerous partial mmches wore also3d ^ 
[1624] 717. (WD40) WD domain. G-beta repeat easoiouna. 
The ancient regulatory-protem family of WD-repeat proteins. 
Near EJ. Schmidt CJ, ftembudripad R, Smith TF; 
Nature 1 994;37 1 ;297-300. 

Bete-transducin (G-beta) is one of the three subunits (alpha, beta, and gamma) of the guanino nucleotide-bindino 
m Z ?r Si""' ^^^^edlaries in the transduction of sigrSs genirated by J^sLISe^e^^rs 

m. The alpha subunit bmds to and hydrolyzes OTP; the functions of the beta and gamma subunits reTss bm 
J^^seem to be requred for the reptocement of GDP by GTP as weO as lor memb'Ze ancho^ng ^d^^t^rSc 

[1625] In higher eukaryotes G-beta exists as a small multigene famiV of highly oonsenred proteins of about 340 
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of twomain subsets: - Subset 1 . to which belongs XPG, RAD2 from budding yeast and radl3 from fission yeast. RAD2 
and XPG are single-stranded DNA endonucleases (7.8]. XPG makes the 3*incision in human DNA nucleotide excision 
repair (9]. - Subset 2, to which belongs mouse and human FEN-1, rad2 from fission yeast, and RAD27 from budding 
yeast. FEN-1 is a structure-specific endonuclease. In addition to the proteins listed in the above groups, this family 
also includes: - Fission yeast exol, a 5'->3' double-stranded DNA exonuclease that could act in a pathway that corrects 
mismatched base pairs. - Yeast EX01 (OHSl), a protein with probably the same function as exol. - Yeast DIN7.Se- 
quence alignment of this family of proteins reveals that similarities are largely confined to two regions. The first is 
located at the N-terminal extremity (N-region) and corresponds to the first 95 to 105 amino acids. The second region 
is internal (l-region) and found towards the C-tenninus; it spans about 140 residues and contains a highly consented 
core of 27 amino acids that includes a conserved pentapeptide (E-A-{DE1-A-[QSJ). It is possible that the conserved 
acidic residues are involved in the catalytic mechanism of DNA excision repair In XPG. The amino acids linking the N- 
and I -regions are not conserved; indeed, they are largely absent from proteins belonging to the second subset. Two 
signature patterns have been developed for these proteins. The first corresponds to the central part of the N-region. 
the second to part of the l-region and Includes the putative catalytic core pentapeptide 
[1636] Consensus pattern: (VIHKRE]-P-x-(FYlL]-V-F-D-G-x(2)-[PlLl-x-{LVC]-K- 
Consensus pattern: [GS]-{LtVM]-[PER]-[FYS]-(LI VK4]-x-A-P-x-E-A-[DE]-fPASl- [QSJ-[CLM]- 

[1637] [ 1] Tanaka K., Wood R.D. Trends Biochem. Sci. 19:83-86(1 994).( 2] Scherty D.. Nouspikel T, Corlet J„ Ucia 
C. Bairoch A,. Clarkson S.G. Nature 363: 182-1 85(1 993).( 3] Carr A.M.. Sheldrick K.S., Murray J.M.. Al-Harithy R.. 
V\fatts F.Z., Lehmann A.R. Nucleic AckJs Res. 21:1 345-1 349(1 993).[ 4] Murray J.M.. Tavassoli M.. Al-Harithy R.. Shel- 
drick K.S., Lehmann A.R.. Can A.M., Watts F.Z. Mol. Cell. Biol. 1 4:4878-4888(1 994).[ 5] Harrington J.J.. Lieber M.R. 
Genes Dev. 8: 1344-1 355(1 994). [ 6] Szankasi P, Smith G.R. Science 267: 11 66-1 169(1 995). [ 7] Habraken Y, Sung R, 
Prakash L. Prakash S. Nature 366:365-368(1 993). [ 8] O'Donovan A., Scherty D.. Clarkson S.G., Wood RD. J. Biol. 
Chem. 269: 15965-1 5968(1 994) .[9] O'Donovan A., Davies A.A.. Moggs J.G., West S.C., Wood R.D. Nature 371: 
432-435(1994). 

[1638] 722. Xanthine/uracil permeases family 

The following transport proteins which are involved in the uptake of xanthine or uracil are evolutionary related [1): 

Uric uric actd-xanthine permease (gene uapA) from Aspergillus nidulans. 
Punne permease (gene uapC) from Aspergillus nidulans. 
Xanthine permease from Bacillus subtilis (gene pbuX). 

- Uracil permease from Escherichia coll (gene uraA) [2] and Bacillus (gene pyri^). 
Hypothetical protein ycdG from Escherichia coll. 

Hypothetrcal protein ygfO from Escherichia coli. 
Hypothetical protein ygf U from Escherichia coli. 
Hypothetk^ai protein yicE from Escherichia coli. 

- Hypothetk:al protein yunJ from Bacillus subtilis. 
Hypothetrcal protein yunK from Bacillus subtilis. 

[1639] They are proteins of from 430 to 595 residues that seem to contain 12 transmembrane domains. 

The best conserved region which corresponds with what seems to be the tenth transmembrane domain has been 

selected as a signature pattern. 

- Consensus pattern: (U VM]-P-x-[PASIF]-V-ILI VM]-G-G-x(4)-[Ll VMl-[FY]-[GSAl-x-[LI VMl-x(3)-G 

( 1] Diallinas G.. Gorfinkiel L. Arst G.. Cecchetto G., Scazzocchk5 C. J. Bral. Chem. 270:8610-8622(1995). 
1 2J Andersen PS.. Frees D.. Fast R,. Mygind B. J. Bacterid, 177:2008-2013(1995). 

[1 640] 723. Hypothetkal yabO/yceC/sfhB family 

The foltowing proteins, whrch seems to belong to a family of pseudouridine synthases (EC 4.2.1.70) [1] have been 
shown to share regions of similarities: 

- Escherichia coli and Haemophilus influenzae ribosomal large subunit pseudourkline synthase A (gene rluA). It Is 
responsible for synthesis of pseudourkdine from uracil-746 IN 23S rRNA 

Escherichia coli and Haemophilus influenzae ribosomal large subunit pseudourWine synthase C (gene riuC). It is 
responsible for synthesis of pseudourkjine from uracil at positions 955, 2504 and 2580 in 23S rRNA. 
Escherichia coli protein and homologs in other bacteria large subunit pseudouridine synthase D (gene riuD). 

- Yeast DRAP deaminase (gene R1B2). 

- Escherichia cdl hypothetrcal protein yqcB and H1 1435. the corresponding Haemophilus influenzae protein. 
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' ISfiTr""Tf^' ^'"'^'^yWRNA B^th^iase. The domain is present three times in a region that sepa- 
rates the N-terminal glutamyt-tRNA synthetase domain from the C-teiminal proM-tRNA svnthrtase donS, 

- Drosoph..amultifuncUonalaminoacyMRNAsynthetase.Thed«nainispresJ^^^ 

- Mammal«n tnrptophanyl-tRNA synthetase. The domain is found at the N-temiinal extrlmrtl 

- Italian. «sect. nematode and plant glycyWRNA synthetase. The domain is found at the N-tem,inal extremity 

- Mammafian histidyHRNA synthetase. The dornain is found at the N-tenninal extremity. 

!hf J *TfoK;r"^ WHEP-TRS. could contain a central alphabetical region and may play a role in 

the assocation of tRNA-synthetases into multienzyme complexes ^ 
[1630] A signature pattern based on the first 29 positions of the WHEP-Domain has been developed. 

■ '^"''"^ t°^J-^^°NEAhxKUV^HKRl-x(2)-K-X(2HKRNGHASJ-x(4HUVHDENK^x(^^^^ 

1 1! * '^^"^ D- Mirande M.. Semeriva M. EMBO J. 10 4267-4277<i99ii 

1 2] igada S.. Chang P.K.. Dignam J.D. iu.4^;o/-4^//(i991). 

J. Biol. Chem. 268:7660-7667(1993). 

[1631] 719. (Womi family 8) Putative membrane protein 
Analysis of protein domain families in Caenorhabditis elegans. 
Sonnhammer EL, Ourbin R; 
Genomics 1997;46:200-216. 

This family called family 8 in (1 J, may be a transmembrane protein 
The specific function of this protein is unknovwi. 
[1632] 720. Xylose isomerase 

Z^i^Z!.^?t^l^-^'^ " 1^ '"""^ " microorganisms which catalyzes the interconve^ion of 0- 
xyiose to D-xylulose. It can also isomenze D-ribose to O-cSiulose and D-glucose to D-f ructose Xvlose isomera.!A <=fl«mc 

n^umZX r^"'"" ^ '^'^ «he tetrameSst^Se e e^^e A 

number of residues are consenred in all known xylose isomeiases enzyme. A 

flS 5?"^* ^^'^^^ also exists in plants [2) where it is homodimeric and is manganese^Jependent 

[ 634] Two signatures patterns for xylose feomerase have been developed. The first «w fedeS^d?!, a stitch 

tlZ^::7:'. "^fJr'"^ ^ ^'"^^ know? to be one JZ foufr^WuSToSSt^ 

ttie binding of the magnesium nn [3]; this pattern also includes a lysine residue which is involved in the cateMfc a^ititT 

H « ~"senred region in the N^emiinal section of the enzymru^rclS^hSTe 

residue whch has been shown (4) to be involved in the catalytic mechanism of the enzymr 
Consensus pattern: [LI]-E-P-K-P-x(2)-P «"-y"ie. 
[E is a magnesium ligand] 
[K is an active site residue] 

- Consensus pattern: (FL|-H-D-x-D-(UVl-x-(PO]-x^GDEJ 
[H is an active site residue] 

[ 1 J Dauter Z.. Dauter M., Hemker J.. WiUel H., Wilson K.S 
FEBS Lett. 247: 1 -e( 1 989). 

( 2] Kristo RA., Saarelainen R, Fagerstiom R.. Aho S.. Korhob M 
Eur. J. Biochem. 237:240-246(1 996). 
[ 3J Heniick K.. Collyer C.A.. Blow D.M. 
J. Mol. Bk)L 208:129-157(1989). 

[ 4] Vangiysperre W.. Ampe C. Kersteis-HiWerson H. Tempst P 
Biochem. J. 263:195-199(1989). 

[1635] 721. XPG protein signatures. Xerodeima pigmentosum (XP) [1] is a human autosomal racesfih/a 
charactenzod by a high incidence of sunlight-induced skin cancer. Peoplei skin cells ^^J^Z^^h^^" 
sitn/e to ultraviolet light, due to defects in the incision step of DNA excisiS^?^^ C« ^ 
genetic complementation groups ^vo.ed in this P^r,../xpZ"^'^ZLl:^7:nZ^^ 
133 Kd nuclear protein called XPG (or XPGC) (2J.XPG bekx,gs to a family of proteins (2.3.4.5^^ t^S^e^ 
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Arg and Lys. 

- Cafboxypeptldase N (EC 34. 17.3) (also known as arginine carboxypeptidase). a plasma enzyme which protects 
the body from potent vasoactive and inflammatory peptides containing C-lerminal Arg or Lys {such as kinins or 
anaphylatoxins) which are released into the circulation. 

- Carboxypeptidase H (EC 3.4.17.10) (also known as enkephalin convertase or carboxypeptidase E). an enzyme 
located in secretory granules of pancreatic islets, adrenal gland, pituitary and brain. This enzyme removes residual 
C-terminal Arg or Lys remaining after initial endoprotease cleavage during prohormone processing. 

- Carboxypeptidase M (EC 34.17.12), a membrane bound Arg and Lys specific enzyme. 

It is ideally situated to act on peptide hormones at local tissue sites where it could control their activity before or after 
interaction with specific plasma membrane receptors. 

- Mast cell carboxypeptidase (EC 34.17.1). an enzyme with a specificity to carboxypeptidase A. but found in the 
secretory granules of mast cells. 

- Streptomyces griseus cartDoxypeptidase (Cpase SG) (EC 3.4.1 7,-) [3], which combines the specificities of mam- 
malian carlxjxypeptidases A and B. 

- Thermoactinomyces vulgaris carboxypeptidase T (EC 34.17.18) (CPT) [41 which also combines the specificities 
of carboxypeptidases A and 6. 

- AEBP1 [5], a transcriptional repressor active In preadipocytes, AEBP1 seems to regulate transcription by cleavage 
of other transcriptional proteins. 

• Yeast hypothetical protein YHRl 32c. 

[1 648] All of these enzymes bind an atom of zinc. Three consented residues are implicated in the binding of the zinc 
atom: two histidines and a glutamic acid Two signature patterns which contain these three zinc-ligands have been 
derived. 

- Consensus pattern: [PK]-x-ILIVMFYl.x-(Ll VMFY].x(4)-H-[STAG]-x-E-x-[LIVMHSTAG]-x(6)-[LIVMFYTAl [H and E 
are zinc ligands] 

- Consensus pattern: H-[STAG]-x(3).[UVMEJ-x(2)^U VMFYW]-PHFYW) [H is a zinc ligand] 

1 1] Tan F.. Chan S.J.. Steiner D.F.. Schilling J.W.. Skidgel RA 
J. Biol. Chem. 264:13165-13170(1989). 

[ 2) Reynolds D.S., Stevens R.L. Gurley D.S., Lane W.S.. Austen K.F.. 
Serafin W.E. 

J. Biol. Chem. 264:20094-20099(1989). 

1 3] Narahashi Y 

J. Biochem. 107:879-886(1990). 

1 4] Teplyakov A.. Polyakov K.. Obmotova G., Strokopytov B., Kuranova I.. 
Osterman A.L, Grishin N.V., Smulevitch S. V.. Zagnitko O.P., 
Galperina O.V., Matz M.V., Stepanov V.M. 
Eur. J, Biochem. 208:281 -288(1 992). 
{ 5J He G.-R. f^uise A., Li A.W.. Ro H.-S, 
Nature 378:92-96(1995). 

{ 6] Hourdou M.-L. Guinand M.. Vacheron M.J., Michel G., Denoroy L. 
Duez CM., Englebert S.. Joris B., Weber G., Ghuysen J,-M. 
Biochem. J. 292:563-570(1993). 
( 7] Rawlings N.D.. Barrett A.J. 
Meth. Enzymol. 248:183-228(1995). 

[1649] 726. Zinc finger. C2H2 type 

The C2H2 zinc finger is the classical zinc finger domain. 

The two consented cysteines and histidines coordinate a zinc kx\. The following pattern describes the zinc finaer 
#-X-C-X(1-5)-C-X3-#-X5-#-X2-H-X(3-6)^H/C] * 
Where X can be any amino acid, and numbers in brackets indicate the number of residues. The positions marked # 
are those that are important for the stable fold of the zinc finger The final position can be either his or cys. 
The C2H2 zinc finger is composed of two short beta strands foltowed by an alpha helix. The amino terminal part of the 
helix binds the major groove in ONA binding zinc fingers. 

[1 650] 'Zinc finger* domains [1 -5] are nuciek: acid-binding protein structures first klentified In the Xenopus transcrip- 
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■ Haemophilus ffifluenzae hypothetical protein HI0042. 

- Aquif ex aeolicus hypothetical protein AQ_l 758. 

- Bacillus subtilis hypothetical protein yhcT. 

- Bacillus subtilis hypothetical protein yjbO. 

- Bacillus subtilis hypothetical protein ytyS. 

- Helicobacter pylori hypothetical protein HP0347. 

- Helicobacter pylori hypothetical protein HP0745. 

- Helicobacter pylori hypothetical protein HP0956. 

- Mycoplasma genitalium hypothetical protein MG209. 

- Mycoplasma genitalium hypothetical protein MG370. 

- Synechocystis strain PCC 6803 hypotheticaf proteBi sirl 592 

- Synechocystis strain PCC 6803 hypothetical protein sir 1 629, 

- Yeast hypothetical protein YDL036c. 

- Yeast hypothetical protein YG R1 69c. 

- Fission yeast hypothetical protein SpACI 8B11 .02c. 

- Caenofhabditis elegans hypothetical protein K07E8.7. 

Si"y L teT^S uTi^L*!.'?^^' K ?K ""f ^""^"^ ^ ""'"'^^ °' ^^''^ h their central section 

I ney can iMpKlced up in the datatwse by the fotlowinghi^ conserved panem. 

- consensus pattern: lUVCAHNHY71^R^U^D-x(2^T^STAJ^-[UVAGCHUVMF](2HUV(^FGCHSGTAC^ 

flf^S } °- ^"9'"™^ N.. Ofengand J. J. Biol. Chem 273: 18562- 185660 9981 

'^^2 70m rJlrJT'"' T^^' ^^'-9 - .aS?2 pseudouridhe synthases 

(bC 4.2.1,70) [1] also have been shown to share regions of similarities: synmases 

- Escherichia coli and Haemophilus influenzae 16S pseudouridylate 516 svnthase /Fr z 9 i /^^« a v -r. 

- Aquif ex aeolicus hypothetk:al protein AQ_1 464. 

- Bacillus subtilis hypothetical protein ypuL 

- Bacillus subtilis hypothetical protein ytzF. 

- Borreiia burgdorferi hypothetical protein 8801 29. 

- Helicobacter pylori hypothetical protein HP 1 459. 

- Synechocystis strain PCC 6803 hypothetical protein slf0361 

- Synechocystis strain PCC 6803 hypothetical protein slr061 2. 

- Consensus pattern: G-R-L-D-x(2HSTA]-x-G-IUVFAHUVIWIl^(3)4STHONSTl 

nSIl ypi'^^'T' " ^ '^•J*'"^ ^ J Biochemistnr 34:8904-891 3(, 995) 

[1646] 724. Zmc finger present Bi dystrophin, CBP/tj300 jiimtoj. 

ZZ in dystrophin binds calmodulin 
Putative zinc Finger; binding not yet shown. 
[1647] 725. Zinc carlwxypepticlase 

There are a number of different types of zinc^ependent carboxypeptidases fEC 34 17 wi 91 aii «, 
seem to be structural^ and function^iy .eUUed. The er^s^^^rli^t^yi: 

- <=-'^'WeP'*'^««B(EC3.4.17.2).alsoapancreaticdigestiveenzyme.but.ha.prefer^^^^^ 
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after the second cysteine; it is generally an aromatic or aliphatic residue. 

- Consensus pattern: C-x(2.4)-C-x(3)-[LIVMFYWC]-x(8).H-x(3,5)-H [The two C's and two H's are zinc ligandsj 

[1|Klug A.. Rhodes D. 

Trends Biochem. Sci. 12:464-469(1987). 

[ 2) Evans R.M.. HoHenberg S.M. 

Cell 52:1-3(1988). 

[ 3] Pay re R, Vincent A. 

FEBS Lett. 234:245-250(1988). 

f 4] Miller J.. Mcl^chlan A,D.. Klug A. 

EMBO J. 4:1609-1614(1985). 

( 5] Berg J.M. 

Proc. Natl. Acad. Sci. U.S.A. 85:99-102(1988), 

[ 61 Rosenfeld R.. Margallt H. 

J. Biomol, Struct. Dyn. 11:557-570(1993). 

[1654] 727. Zinc finger, C3HC4 type (RING finger) 

A number of eukaryotic and viral proteins contain a conserved cysteine-rich domain of 40 to 60 residues (called C3HC4 
zinc-finger or 'RINGTinger) (1 J that binds two atoms of zinc, and is probably involved in mediating protein-protein in- 
teractions. The 3D structure of the zinc ligation system is unique to the RING domain and is refered to as the 'cross- 
brace" motif. The spacing of the cysteines in such a domain is C-x(2)-C-x(9 to 39)-C-x(1 to 3)-H-x(2 to 3)-C-x(2)-C-x 

(4 to 48)-C-x(2)-C. 

[1655] Proteins currently known to include the C3HC4 domain are listed below (references are only provided for 
recently determined sequences). 

- Mammalian V(D)J recombination activating protein (gene RAGl). RAGl activates the rearrangement of immu- 
noglobulin and T-cell receptor genes. 

- Mouse rpt-1 . Rpt-1 is a trans-acting factor that regulates gene expression directed by the promoter region of the 
interleukin-2 receptor alpha chain or the LTR promoter region of HIV-1 . 

- Human rfp. Rfp is a devetopmentally regulated protein that may f unctkxi in male germ cell development. Recom- 
bination of the N-terminai sectk>n of itp with a protein tyrosine kinase produces the ret transforming protein. 

- Human 52 Kd Ro/SS-A protein. A protein of unknown f unctran from the Ro/SS-A ribonucleoprotein complex. Sera 
from patients with systemic lupus eiythematosus or primary Sjogren*s syndronrw often contain antibodies that react 
with the Ro proteins. 

- Human histocompatibility kx:us protein RING1 . 

- Human PML. a probable transcriptfon factor. Chromosomal translocatkxi of PML with retinoic receptor alpha cre- 
ates a f usk>n protein which is the cause of acute promyelocytic leukemia (APL). 

- Mammalian breast cancer type 1 susceptibility protein (BRCA1 ) [El ]. 
Mammalian cbl proto-oncogene. 

Mammalian bmi-1 proto-oncogene. 

- Vertebrate CDK-activating kinase (CAK) assembly factor MAT1 . a protein that stabilizes the complex between the 
CDK7 kinase and cyclin H (MAT1 stands for 'Menage A Trois.*). 

- Mammalian mel-1 8 protein. Mel-1 8 which is expressed in a variety of tumor cells is a transcriptional repressor that 
recognizes and bind a specific DNA sequence. 

- Mammalian peroxisome assembly factor-1 (PAF-1) (PMP35). which is somewhat involved in the bwgenesis of 
peroxisomes. In humans, defects in PAF-1 are responsible for a fomn of Zellweger syndrome, an autosomal re- 
cessive disorder associated with peroxisomal deficiencies. 

- Human MATl protein, whfch interacts with the CDK7-cyclin H complex. 
Human RING1 protein. 

- Xenopus XNF7 protein, a probable transcription factor. 

- Trypanosoma protein ESAG-8 (T-LR). which may be involved in the postranscriptional regulation of genes in VSG 
expressbn sites or may interact with adenylate cyclase to regulate its activity. 

- Drosophila proteins Posterkar Sex Combs (Psc) and Suppressor two of zeste (Su(z)2). The two proteins belong 
to the Polycomb group of genes needed to maintain the segment-specific repression of homeotic selector genes. 

- Drosophila protein male-specific msl-2. a DN A-binding protein which is involved in X chromosome dosage com- 
pensation (the elevatbn of transcriptton of the male single X chromosome). 

- Arabidopsis thaliana protein COP1 which is involved in the regulatwn of photomorphogenesis. 
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[ 1] Aitken A. 

Trends Biochem. ScL 20:95-97(1995). 

[ 2] Morrison D. 

Science 266:56-57(1994). 

{ 3] Xiao B.. Smerdon S.J.. Jones D.H.. Dodson G.G.. Soneji Y. Aitken A.. Gamblin SJ. 
Nature 376:188-191(1995). 

[1670] 733. D-isomer specific 2-hydroxyacid dehydrogenases (2 Hacid OH) 

This Pfam covers the Formate dehydrogenase. D-glycerate dehydrogenase and D-lactate dehydrogenase families in 
SCOP. A number of NAD-dependent 2-hydroxyacid dehydrogenases which seem to be specific for the D-isomer of 
their substrate have been shown [1 .2.3.4] to be functionally and structurally related. These enzymes are listed below. 

• D-lactate dehydrogenase (EC 1 . 1 . 1 .28). a bacterial enzyme which catalyzes the reduction of D-lactate to pyruvate. 

- D-glycerate dehydrogenase (EC 1.1.1 .29) (NADH<Iependent hydroxypyruvate reductase), a plant leaf peroxiso- 
mal enzyme that catalyzes the reduction of hydroxypyruvate to glycerate. This reaction is part of the glycolate 
pathway of photorespiration. 

- D-glycerate dehydrogenase from the bacteria Hyphomicrobium methylovorum and Methylobacterium extorquens. 

- 3-phosphoglycerate dehydrogenase (EC 1 . 1 . 1 .95), a bacterial enzyme that catalyzes the oxidation of D-3-phos- 
phoglycerate to 3-phosphohydroxypyruvate. This reaction Is the first committed step in the 'phosphorylated' path- 
way of serine biosynthesis. 

- Erythronate-4-phosphate dehydrogenase (EC 1 . 1 , 1 .-) (gene pdxB). a bacterial enzyme involved in the biosynthesis 
of pyridoxine (vitamin B6), 

- D-2-hydroxyisocaproate dehydrogenase (EC 1 . 1 . 1 .-) (D-hicDH). a bacterial enzyme that catalyzes the reversible 
and stereospecific interconversion between 2-ketocarboxyiic acids and 0-2-hydroxy-cartx)xylic acids, 

- Formate dehydrogenase (EC 1 .2. 1 .2) (FDH) from the bacteria Pseudomonas sp. 1 01 and various fungi [5]. 

- Vancomycin resistance protein vanH from Enterococcus faecium; this protein is a D-specific alpha-keto acid de- 
hydrogenase involved in the formation of a peptidoglycan which does not terminate by D-alanine thus preventing 
vancomycin binding. 

Escherichia coli hypothetical protein ycdW. 
Escherichia coli hypothetical protein yiaE. 

• Haemophilus influenzae hypothetteal protein H1 1 556. 

- Yeast hypothetical protein YER081 w. 
Yeast hypothetical protein YIL074W. 

[1 671 ] All these enzymes have similar enzymatic activities and are structurally related. Three of the most consented 
regkans of these proteins have been selected to devetop patterns. The first pattern Is based on a glycine-fich region 
located in the central section of these enzymes; this region probably corresponds to the NAD-binding domain. The two 
other patterns contain a number of conserved charged residues, some of which may play a role in the catalytic mech- 
anism. 



- Consensus pattern: [LIVMA]-(AG]-[l\n-l-IUVMFY]-IAGl-x-G-lNHKRQGSACHU\^-G-x(13,14)-lLIVfMT|-x(2)-[FY- 
wCTH]-[DNSTK] 

• Consensus pattern: [UVMFYWA]-[LIVFYWC]-x(2)-{SAC]-{DNQHRHIVFA]-(UVF]-x-(LIVF]-IHNI]-x-P-x(4)-(STN]- 
x(2)-[LIVMF]-x-[GSDNl 

- Consensus pattern: lLMFATCl-(KPQ]-x-lGSTDNl-x-ILIVMFYVVRJ-{LIVMFYVVl(2)-N-x-[STAGC]-R-rGPj-x-fLIVHl- 
ILIVMC]-[DNV] 

[1] Grant G.A Biochem. Biophys. Res. Commun. 165:1371-1374(1989), 

[2] Kochhar S., Hunziker P., Leong-Ktorgenthaler P.M.. Hottinger H. Biochem. Biophys. Res. Commun 184 60-66 
(1992). 

[3] Ohta T. Taguchi H. J. Biol, Chem. 266:12588-12594(1991). 

[4] Goldberg J.D., Yoshida T. Brick P, J. Mol. Biol. 236:1123-1140(1994). 

(5) Popov V.O. l^zin V.S, Biochem. J. 301:625-643(1994). 

[1672] 734. 2-oxo acid dehydrogenases acyltransferase (catalytic domain) 

Refined crystal structure of the caialytk: domain of dihysrolipoyi transacetylase (E2P) from azotobacter vineelandii at 
2.6 angstroms resolution. 

Mattevi A. Obmotova G. Kalk KH. Westphal AH. De Kok A. Hoi WG; 
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- Fungal DNA repair proteins RADS. RAD16. RADIS and radS 

- Herpes>«nises .rans^ing transcriptional protein ICPCVIEII This protein which has been characterized in manv 

- jrcr'niTs^reTc^r^^^^^ 

- Bacutoviruses major immediate earJy protein (PE-38). 

■ Baculovinjses immediate-early regulatory protein IE-N/IE-2. 

- Caenortiabditis elegans hypothetical proteins F54G8.4. R05D3.4 and T02C1 1 

• Yeast hypothetical proteins YER116C and YKR01 7c. 

I16S6] The central region o( the domain was selected as a signature pattern for the C3HC4 finger 

• Consensus pattern: C-x•H-x^U VMFY]-C-x(2)-C-[UVMYAl 

11657] 1 1J Borden K.LB.,FreemontP.S. 

Curr. Opin. Stnict. Biol. 6:395-401(1996). 

I16S8] 728. Zinc finger C-x8-C-x5-C-x3+l type (and similar). 

[1 659] 729. Zinc finger. CCHC class 

A family of CCHC zinc fingers, mostly from retroviral gag proteins (nucleocapsid). Prototype stoicture is from HIV 
Ateo contams members involved in eukaryotic gene regulation, such as C. elegans GLhT 
Structure is an 18-residue zinc finger; no examples of indels in the alignment 
[1660] 730. Zn-finger in Ran binding protein and others 
[1661] 731. ANMike Zinc finger 

[1662] Zinc fmger at the C-terminus of Ani Swiss:Q9l8e9 . a ubiquitin-like protein in Xenopus laevis The followina 

pattern descnbes the zinc finger. C.X2<:-X(9.12).C.X(1-2).C.X4.C.X2-H-X5.^^^^^ Vymerr^J^n^^^^^^^ 

and numbers m brackets indicate the number of residues. ^ ' 

[1663] [1] Unnen JJW, Bailey CP. Weeks DL; Gene 1993 128 18M88 

[1664] 73a 14-3-3 proteins 

Structure of a 14-3-3 protein and implications for coordinatbn of multiple signalling pathi^ys 

Xiao B Smerdon SJ. Jones DH, Dodson GG. Soneji Y. Aitken A, Gamblin SJ; Natural 995 376-188.191 

Crystal structure of the zeta iscf orm of the 1 4-3-3 proteffi. ^/o. i hb-i 9i . 

Liu D. Bienkowska J. Petosa C. Collier RJ. Fu H. Lddington R- 

Nature 1 995;376: 1 9 1 -1 94. 

Cell 1996;84:889-897. 

[1 666] The 1 4-3-3 protein binds its target proteins with a common site located to«vaids the C-terminus 

lc^.mu|a T. Ito M. Itagaki C. Takahashi M. Horigome T. Omata S. Ohno S. Isobe T 

FEBS Lett 1997;413:273-276. 

[1667] Molecular evolution of the 14-3-3 protein family. 

Wang W, Shakes DC 

J Mol Evoi 1996;43:384-396. 

Functkxi of 14-3-3 proteins. 

Jin DY. Lyu MS. Kozak CA, Jeang KT 

Nature 1996;382:308-308. 

[1668] The 14-3-3 proteins {1.2.3] are a family of ctosely related acidic homodimeric oroteins of about m kh u^i^K 
wem fi^dentified as being very abundant in mammaten brain tissues Z^^^ p iTentljT^^s T^e 

- Consensus pattern: R-N-L-pjVl-S4VG]-{GA]-Y-[KN]-N-{IVA] 

- Consensus pattern: Y-K-{OE]-S-T-L-l-{IM]-Q-L-{LF]-{RHCJ-D-N-{LF]-T-tLS]-W-[TAN]-(SADJ 
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[1 680] Prokaryotic and eukaryotic 6PGD are proteins of about 470 amino acids whose sequence are highly consented 
{1). A region which has been shown [2]. from studies of the sheep 6PGD tertiary structure, to be involved in the binding 
of 6-phosphogluconate has been selected as a signature pattern. 

- Consensus pattem: (LIVMJ-x-D-x(2)-(GAJ-lNQSl-K-G-T-G-x-W 

[ 1 1 Reizer A.. Deutscher J., Saier M.H. Jr.. Reizer J. 
Mol. Microbiol. 5:1081*1089(1991). 

[ 2] Adams M.J.. Archibald I.G,. Bugg C.E.. Came A.. Cover S.. 
Helliwell J.R.. Pickersgill R.W., White S.W. 
EMBO J. 2:1009-1014(1983). 

[1681] 739. {7tm 1)G-protein coupled receptors [1 to4,E1,E2](alsocalledR7G)arean extensive group of hormones, 
neurotransmitters, odorants and light receptors which transduce extracellular signals by interaction with guanine nu- 
cleotide-binding (G) proteins. The receptors that are currently known to belong to this family are listed betow 

- 5-hydroxytryptamine (serotonin) 1 A to 1 F. 2A to 2C. 4. 5A. SB. 6 and 7 [5]. 
Acetylcholine, muscarinic-type, Ml to MS. 

- Adenosine A1 . A2A. A2B and A3 [6]. 

- Adrenergic aIpha-1 A to -1 C; alpha-2A to -2D; beta-1 to -3 [7]. 
Angiotensin il types 1 and II. 

Bombesin subtypes 3 and 4. 
Bradykinin B1 and B2. 
c3a and C5a anaphylatoxin. 
Cannablnoid CB1 and CB2. 

- Chemokines C-C CC-CKR-1 to CC-CKR-8. 

- Chemokines C-X-C CXC-CKR-1 to CXC-CKR-4. 

- Cholecystokinin-A and cholecystokinin-B/gastrin. 
Dopamine Dl to D6 (8J. 

- Endothelin ET-a and ET4> [9]. 

- fMet-Leu-Phe (fMLP) (N-formyl peptide). 

- Follicle stimulating hormone (FSH-R) [10]. 
Galanin. 

Gastrin-releasing peptide (GRP-R). 
Gonadotropin-releasing hormone (GNRH-R). 

- Histamine HI and H2 (gastric receptor I). 

- Lutropin-chork)gonadotropic hormone (LSH-R) (1 0]. 

- Melanocortin MCI R to MC5R. 
Melatonin. 

Neuromedin B (NMB-R). 

- Neuromedin K (NK-3R). 

- Neuropeptide Y types 1 to 6. 

- Neurotensin (NT-R). 

- Octopamtne (tyramine), from insects. 

- Odorants (11). 

- Opioids delta-, kappa- and mu-types (12). 

- Oxytocin (OT-R), 

Platelet activating factor (PAF-R). 

Prostacyclin. 

Prostaglandin D2, 

- Prostaglandin E2. EP1 to EP4 subtypes, 

- Prostaglandin F2. 

- Purinoreceptors (ATP) (1 3). 
Somatostatffi types 1 to 5. 

- Substance-K (NK-2R). 

- Subslance-P (NK- 1 R). 
Thrombin. 
Thromboxane A2. 
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J Mol Biol 1993;230:1183-1199. 

n673i ^'tT'^^^^T "^'^^ "^^^ °' ^ ^"^^^ ^ <^talytic domain 

[1673] 735. 34)etahydroxyslerkxJdehydrogenaserisomefase family 
Structure and tissue-specific expression of 3 

oSSSSer '^^^y*°9^'^^^5-ene^-«ne isomerase genes in hun«n and «, classical and peripheral s.er- 

Labrie F. Simard J, Luu-The V. Pelletier G. Belanger A. 
Lachance Y. Zhao HF, Ubrie C. Breton N. de Launoft Y. et al 
J Steroid Biochem Mol Bid 1992;41:421-435 

iSJ^52Z'<rflSi'?bS:^" dehydrogenase/5.ene-4-ene isomerase (3 be,a-HSD) catalyzes «,e oxidation and 

.'^n^.^ Of 5^"e-3 beta-hydnoxypregnene and 5-ene-hydroxyandrostene steroid precursors into the corresoond- 

ing 4-ene-ketosteroids necessary for the forrration of all classes of steroid hormones "'^''^^ correspond 

[1674] 736. 3+ydroxyacyl-CoA dehydrogenase 

This family also includes lambda ciystallin. 

Structure of L-3-hydroxyacyl-coenzyme A dehydrogenase. 

prelimlnaiy chain tracing at 2.8-A resolution. 

Birktoft JJ, HoWen HM, Hamlin R. Xuong NH. Banaszak U; 

Proc Natl Acad Sci U S A 1 987;84:8262-8266. 

SI dehydrogenase (EC 1 .1 .1 .35) (HCOH) [1] is an enzyme involved in fatty acid metabolism 

acatalyzeslhereduct.onof3*ydroxyacyl^Ato3^oacyl<>jA.I»/.osteuka^^^ 

systems one knated in mitochondriaandtheotherinperoxisomes lnperoxisLesS-^^^^ 

fom,s. w«h enoyl-CoA hydratase (ECH) and 3, 2-trans-enoyl^oA isor^erase (EC!) a muSSi^^^Z!^^ 

JMem«nal domain bears the hydratase^somerase aotivitrL and the C-termLl <^t ,r^^ 

Eals^S^t^L'^SiJlc.^^ 

^ ECH/ECI domarn as well as a 3-hydroxybutyryl-CoA epimerase domain [21 

[16771 ine other protens stnicturally related to HCOH are: 

■ t^ceS^Alt'^'^'''"^'' dehydrogenase (EC 1.1.1.157) which reduces 34,ydroxybutanoyK:oA to ace- 
- Eye lens protein lambda-crystaHin [4). which is specific to lagomoiphes (such as rabbit). 

TTiere are two major regnn of similarities in the sequences of proteins of the HCOH family the first one k>cated in the 

■ SlSS'"" '^"^'"^ '°'^^W2)^GAI-F^UVMFY^x^^lT)-R-x(3)^PAHUVI^(2)-^^^^ 

1^1 JBWctoff J.J., HokJen H.M.. Hamlin R. Xuong N.-H.. Banaszak L.J. Proc. mi Acad. Sci. U.S.A. 84:8262-8266 
[ 2] Nakahigashi K., Inokuchi H. Nuciek: Acids Res. 18 4937-4937(1990) 

( 3] Mullany P.. Clayton C.L. Pallen M.J.. Skxie a. Al-Saleh A.. Tabaqchali S. FEMS Mfcrobiol. Lett 124:61-67 

" ■ * >'«"9 VV.W. J. Biol. Chem. 263: 

[1678] 737. 60s Acidic ribosonial protein 

Proteins PI. P2, and PO. components of the eukaryotic 

ribosome stalk. New structural and functional aspects. 

Remacha M, Jimenez-Diaz A. Santos C. Briones E. Zambrano R. 

Rodriguez Gabriel MA. Guarinos E. Ballesta JP; 

Biochem CeH Biol 1995;73:959-968. 

This family includes archaebacterel LI 2, eukaryotic PO. PI and P2. 
[1679J 738. 6-phosphogluconate dehydrogenases 
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Integral membrane proteins with seven transmembrane regions that belong to family 1 of G-protein coupled receptors. 
[1 685] In vertebrates four different pigments are generally found. Rod cells, which mediate vision in dim light, contain 
the pigment rhodopsin. Cone cells, which function in bright light, are responsible for color vision and contain three or 
more color pigments (for example, in mammals: red, blue and green), 

[1686] In Drosophila. the eye is composed of 800 facets or ommatidia. Each ommatidium contains eight photore- 
ceptor cells (Rl -R8): the R1 to R6 cells are outer cells, R7 and R8 inner cells. Each of the three types of cells (R1 -R6, 
R7 and R8) expresses a specific opsin. 

[1687] Proteins evolutionary related to opsins include squid retinochrome, also known as retinal photoisomerase. 
which converts various isomers of retinal into 11-ds retinal and mammalian retinal pigment epithelium (RPE) RGR (3], 
a protein that may also act in retinal isomerization, 

(1 688] The attachment site for retinal in the above proteins Is a conserved lysine residue in the middle of the seventh 
transmembrane helix. The pattern that had been developed includes this residue. 

- Consensus pattern: [UVMWAClKPGCJ-x(3)-[SACl-K-ISTAUMRHGSACPNV14STACP]-x(2)-rDENFHAPl-x{2) 
{lYl 

[K is the retinal binding site] 



1 1) Applebury M.L., Hargrave P.A. 
Vision Res. 26:1881-1895(1986). 
( 2] Fryxeil K.J.. Meyerowitz E.M. 
J. Mol. Evol. 33:367-378(1991). 

[ 3] Shen D., Jiang M.. Hao W., Tao L. Salazar M., Fong H.K.W, 
Biochemistry 33:13117-13125(1994). 



[1689] The following descriptions of protein family functions are not provided by the Ram or Proslte databases 
[1690] 740. BAH 

BAH domain. Number of members: 65 

[1] Medline: 97074677, Molecular cloning of polybromo. a nuclear protein containing multiple domains including 
five bromodomains, a truncated HMG-box. and two repeats of a novel domain. Nicolas RH. Goodwin GH* Gene 
1996;175:233-240. 

[2] Medline: 99198739. The BAH (bromoiadjacent homology) domain: a link between DNA methylation. replication 
and transcriptional regulation. Callebaut I, Courvalin J-C, Mornon JP; FEBS lefts 1999;446:189-193. 

[1691] 741, ELM2. 

ELM2 domain. The ELM2 (Egl-27 and MTA1 homotogy 2) domain is a small domain of unknown function, Nurriber of 
members: 10 

[1692] 742, Euk proin. EUKARYOTIC^PORIN The major protein of the outer mitochondrial membrane of eukaryotes 
is a porin that fomis a voltage<jependent an ion-selective channel (VDAC) that behaves as a general diffusion pore 
for small hydrophilic molecules [1 to 4]. The channel adopts an open conformation at low or zero membrane potential 
and a closed conformatbn at potentials above 30-40 mV. 

This protein contains about 280 amino acids and its sequence is composed of between 1 2 to 16 beta-strands that span 
the mitochondrial outer membrane. Yeast contains two members of this family (genes POR1 and POR2); vertebrates 
have at least three members (genes VDAC1 , VDAC2 and VDAC3) {5). 

A conserved region kx^ated at the C4ermlnal part of these proteins was selected as a signature pattem. 
C^se^sus pattern[YHJ-x(2).D-ISPCADl-x-[STAJ-x(3)-[TAG]4KRHUVMFl-[DNSTAHDNSl-x(4).IGSTAN]-[LIV^ 

[ 1] Benz R. Bkxdiim. Btophys, Acta 1197:167-196(1994). 
( 2] Manella C.A. Trends Biochem. Set. 17:315-320(1992). 
[ 3] Dihanich M. Experientia 46:146-153(1990). 

[ 4J Forte M., Guy H.R., Mannella C.A J. Bbenerg. Btomembr. 19:341-350(1987). 

[ 51 Sampson M.J.. Lovell R.S.. Davison D.B., Craigen W.J. Genomics 36:192-196(1996). 

[1 693] 743. Glyco hydor 1 9 
Chttinases family 19 signatures 

cross-reference(s) CHITINASE_19_1, CHITINASE_19_2 

Chitinases (EC 3.2.1 . 1 4) (1 ] are enzymes that catalyze the hydrolysis of the beta-1 ,4-N-acetyl-D-glucosamine linkages 
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Thyrotropin fTSH-R) (IQJ. 
Thyrotropin releasing (actor (TRH-R). 
Vasopressin Via, Vlb and V2. 
Visual pigments (opsins and rhodopsin) {14J 
ProtOKjncogene nres. 



Caenothabtfrtis elegans putative receptors C06G4 5 C38C10 1 Caar^ T^n^' . , 
ECRF3. a putative receptor erxxxJed in the genome of heip^virus saimiri 
11682J The stnjcture of all these receptors is thou*t to be idpnt irai tk^., k 

Which n«s, probably spans the membrane ^eXlTsS^L^l "^"^^ ^y*«P»«*« each of 

is Often glycosylated, while the C-terminus s ^oLl^r^il^?^^^ extracellular side of the membrane and 
alternate with three intracellular loops toZklT^^n^Z^^T^ Phosphorylated. Three extracellular loops 
lack a Signal peptide. The most S^sV^ "p^^rS^T^rote^^^^^^ °' '^^^ ^-P^- 

cytoplasmic loops. A consented acidic-ArgJSLticSt fe w^J^^ '"^-ons and the first two 

plasmic loop (1 5J and could be mpPcated in the interaction with G orole extremity of the second cyto- 



1 1JStrosbergA.D. 

Eur. J. Biochem. 196:1-10(1991). 

[ 2) Kerlavage A.R. 

Curr. Opia Struct. Biol. 1:394-401(1991) 

[ 31 Probst W.C.. Snyder L.A.. Schuster D.I.. Broslus J Sealfon S r 

DNACeIlBioMl:i.20(1992). ^ius j.. sealfon S.C. 

1 4] Savarese TM, Fiaser CM. 

Biochem. J. 283:1-9(1992), 

1 5J Branchek T 

Curr. Biol. 3:315-317(1993), 

[ 6] Stiles G.L. 

J. Biol. Chem. 267:6451-6454(1992), 

1 7] Friell T, Kobilka B.K.. Lefkowitz R.J.. Caron M G 

Trends Neurosci. 11:321-324(1988) 

[ 8) Stevens C.F. 

Curr, Bk)I. 1:20-22(1991), 

[ 9] Sakurai T. Yanagisawa M.. Masaki T 

Trends Phamiacol. Sci. 13:103-107(1992) 

110] Salesse a. Remy J. J., Levin J.M.. Jallal B.. Gamier J 

Biochimie 73:109-120(1991). 

[11] Lancet D,. Ben-Arie N. 

Curr. Biol. 3:668-674(1993), 

[12] Uhl G.a. ChlWers S.. Pasternak G. 

Trends Neurosci. 17:89-93(1994). 

[13] Barnard E.A,, Burnstock G., Webb TE. 

Trends Pharmacol. Sci. 15:67-70(1994). 

[14] Applebury M.L. Margrave P. A. 

Vision Res. 26:1881-1895(1986). 

[IS] Attwood TK.. Eliopoulos E.E,. Findlay J B C 

Gene 98:153-159(1991). 



[1 6841 (7tm 1 ) Visual pigments (opsins) retinal binding site 
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- Drosophila small optic lobes protein (gene sol), a neuronal protein that contains a calpain-like domain. 

- Yeast thiol protease BLH WCPl/LAPS. 

• Caenortiabditis elegans hypothetical protein C06G4.2, a catpain-ltke protein. 

[1697] Two bacterial peptidases are also part of this family: 

Aminopeptidase C from L^ctococcus lactis (gene pepC) [5]. 
Thiol protease tpr from Porphyromonas gingivalis. 

[1698] Three other proteins are structurally related to this family, but may have lost their proteolytic activity. 

Soybean oil body protein P34. This protein has its active site cysteine replaced by a glycine. 

- Rat testin, a Sertoli cell secretory protein highly similar to cathepsin L but with the active site cysteine is replaced 
by a serine. Rat testin should not be confused with mouse testin which is a UM-domain protein (see 
<PDOC00382>). 

- Plasmodium falciparum serine-repeat protein (SERA), the major blood stage antigen. This protein of 11 1 Kd pos- 
sesses a C-terminal thiol-protease-like domain (SJ. but the active site cysteine is replaced by a serine. 

The sequences around the three active site residues are well conserved and can be used as signature patterns. 

[1699] Consensus pattemQ-x(3)-[GE]-x-C-[YW]-x(2)-(STAGC]-[STAGCV] (C is the active site residue] 

Note the residue in position 4 of the pattern is almost always cysteine; the only exceptions are calpains (Leu), bleomycin 

hydrolase (Ser) and yeast YCP1 (Ser). Note the residue in position 5 of the pattern is always Gly except in papaya 

protease IV where it is Glu. Consensus pattern[UVMGSTAN]-x-H:[GSACE]-[LlVMl-x-IUVI^AT](2)-G-x-[GSADNH] [H 
is the active site residue] 

Consensus pattem[FYCH]4WI]-[H\n-hx4KRQAG]-N-ISTl-W-x(3)-[l^-G-x(2)-G-[LFYV^ (N 
is the active site residue] 

Note these rotems betong to family CI (papain-type) and C2 (calpains) in the classification of peptidases [7.E1 J. 
( IJDufour E. Bkxjhimie 70:1335-1342(1988). 

[ 2]Kirschke H,. Barrett A.J.. Rawlings N.D. Protein Prof. 2:1587-1643(1995). 

( 3]Shi G.-P. Chapman H.A.. Bhairi S.M.. Oeleeuw C, Reddy V.Y. Weiss S.J. FEBS Lett. 357:129-134(1995). 
( 4]Velasco G., Ferrando A.A.. Puente X.S., Sanchez LM.. Lopez-Otin C. J. Bk)l. Chem, 269:271 36-271 42(1 994). 
I 5]Chapot-Chartier I^.P, Nardi M., Chopin M.C., Chopin A., Gripon J.C. Appl. Environ. Microbiol. 59:330-333 
(1993). 

[ 6]Higgins D.G.. l^cConnell D.J., Sharp P.M. Nature 340:604-604(1989). 
[71Rawlings N.D.. Barrett A,J. Meth. Enzymol. 244:461-486(1994). 

[1700] 746. Peptidase M22 

Glycoprotease family signature cross-reference(s) GLYCOPROTEASE 

Glycoprotease (GCP) (EC 3.4.24.57) (1 ]. or o-syaloglycoprotein endopeptidase, is a metalloprotease secreted by Pas- 
teurella haemofytica which specifically cleaves Osialoglycoproteins such as glycophorin A. The sequence of GCP is 
highly similar to the following uncharactertzed proteins: 

- Escherichia coli hypothetteal protein ygjO (ORF-X). 
Bacillus subtilis hypothetical protein ydiE. 

- Mycobacterium leprae hypothetical protein U229E, 
Mycobacterium tuberculosis hypothetical protein MtCY78.10. 

- Synechocystis strain PCC 6803 hypothetwal protein sl(0807. 

- Methanococcus jannaschii hypothetk^al protein M J 1 1 30. 

- Haloarcula marismortui hypothetk:al protein in HSH 3'regk)n. 

- Yeast hypothetical protein YKR038c. 
Yeast hypothetical protein QR17. 

[1701] One of the conserved regwns contains two conserved histkJines. It is possible that this region is involved in 
coordinating a metal ion such as zinc. 

[1702] Consensus pattem(KR]-{GSAT]-x(4)-(FYWLH]-[DQNGK]-x-P-x-{LIVMFY]-x(3)-H-x(2)-(AGl-H-[LlVMj 
Note these proteins belong to family K/122 in the classification of peptidases [2.E1]. 
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involved in a disulfide bond. ' ^'^^^ chitinases and which is probabry 

Consensus PattemC.x(4.5).F-Y^ST].x(3HFY]^U VMPJ-x-A^^^^^ 

[16941 Consensus pattenrl[UVMJ^GSA^F.x^STAGJ(2)^U^^FY]-W^FY]^W^^ 

( IJFIach J., Pitet P-E.. Jolles P Experientia 48:701-716(1992) 
[ 2] Henrissat B. Biochem, J. 280:309-316(1991), 

[1695] 744. MBD 
Methyl-CpG binding domain 

PJMedline: 99158138. A mammalian protein with specific demethylase activitv for mCnr hma bk « . ^ 
Ramchandani S. Cen^oni N. Szyf M; Nature 1999;397:579-583 ^ Bhattacf«.ya SK, 

[1696] 745. Peptidase CI 

Eukaryotic thiol (cysteine) proteges active sites 



- Vertebrate lysosomal cathepsins B (EC 3.4 22 1 ) Hi^C^AOo^R^ i tcr^ o ^ - 
House-dust mites allergens OerPI and EurMl 

AC-2). andOstertagia ostertagi (CP-i and CP-3^ ^' "^«^«^'««)ntortus (genes AC-1 and 

Slime mold cysteine proteinases CP1 and CP2. 
Cruzipain from Trypanosoma cruzi and brucei 

JI'oS^T "^T^^ ^"^^ fTCP) from various Plasmodium species 
Proteases from Loishmania mexicana. Theileria annulata and TheileWparva 
Baculovinises cathepsin-Hke Enzyme (v-cath) er«l>afva. 



EP 1 033 405 A2 



[1711] Its sequence is moderately conserved between prokaiyotes (gene folC) and eukaryotes. V\fe devetoped two 
signature patterns based on the consen/ed regions which are rich in glycine residues and could play a role in the 
catalytical activity and/or in substrate binding. 
Description of pattem(s) and/or profile(s) 

Consensus pattern[LIVMFY]-x-[LIVM]-[STAG]-G-T-{NKl-G-K-x-[ST]-x(7)-[LIVM](2)-x(3)-lGSK] 
Consensus pattem[UVMFYl(2)-E-x-G-(UVM]-(GA]-G-x(2)-D-x4GST]-x-[LIVMJ(2) 

(1712] ( 1]ShaneB..GarrowT..BrennerA.. Chen L. Choi Y.J.. Hsu J.C., Stover R Adv. Exp Med Biol 338 629-634 
(1993). 

[1713] 750. (Peptidase M3) Neutral zinc metallopeptidases, zinc-binding region signature 

The majority of zinc-dependent metallopeptidases (with the notable exception of the carboxypeptidases) share a com- 
mon pattern of primary structure [1 .2.3] in the part of their sequence involved in the binding of zinc, and can be grouped 
together as a superfamily. known as the metzincins, on the basis of this sequence similarity. They can be classified into 
a number of distinct families 14,E 1 ] which are listed below along with the proteases which are currently known to belong 
to these families. 
[1714] Family Ml 



- Bacterial aminopeptidase N (EC 3.4.11 .2) (gene pepN). 
Mammalian aminopeptidase N (EC 3.4.11 .2), 

- Mammalian glutamyl aminopeptidase (EC 3.4,11.7) (aminopeptidase A). It way play a role in regulating growth 
and differentiation of early B-lineage cells. 

- Yeast aminopeptidase yscH (gene APE2). 

Yeast alanine/arginine aminopeptidase (gene AAP1). 

- Yeast hypothetrcal protein YIL1 37c. 

- Leukotriene A-4 hydrolase (EC 3.3.2.6). This enzyme is responsible for the hydrolysis of an epoxide moiety of 
LTA-4 to form LTB-4; it has been shown that it binds zinc and is capable of peptidase activity. 

[1715] Family M2 



Angiotensin-converting enzyme (EC 3.4.15.1) (dipeptidyl carboxypeptidase I) (ACE) the enzyme responsible for 
hydrolyzing angiotensin I to angiotensin II. There are two forms of ACE: a testis-specific Isozyme and a somatic 
isozyme which has two active centers. 



[1716] Family M3 



Thimet oligopeptidase (EC 3.4.24.15). a mammalian enzyme involved in the cytoplasmic degradation of small 
peptides. 

Neurolysin (EC 3.4.24.16) (also known as mitochondrial oligopeptidase M or microsomal endopeptkiase). 
Mitochondrial intermediate peptidase precursor (EC 3.4.24.59) (MIP). It is involved the second stage of processing 
of some proteins imported in the mitochondrion. 
Yeast saccharolysin (EC 3:4.24.37) (proteinase yscD). 

Escherichia coli and related bacteria dipeptidyl carboxypeptidase (EC 3.4.15.5) (gene dcp). 
Escherichia coli and related bacteria oligopeptkJase A (EC 3.4.24.70) (gene opdA or prtC). 
Yeast hypothetical protein YKL1 34c. 



[1717] Family M4 



Thermostable thermolysins (EC 3.4.24.27). and related themiolabile neutral proteases (bacillolysins) (EC 
3.4.24.28) from varksus species of Bacillus. 

Pseudolysin (EC 3.4.24.26) from PseudonKMias aeruginosa (gene lasB). 

Extracellular elastase from Staphylococcus epidermldis. 

Extracellular protease prti from Erwinia carotovora. 

Extracellular minor protease smp from Serratia nriarcescens. 

Vibriolysin (EC 3.4.24.25) from varwus species of Vibrio. 

Protease prtA from Listeria monocytogenes. 

Extracellular proteinase proA from LegkDnella pneumophila. 



[1718] Family M5 
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I IJAbdullah K.M.. Lo R.Y.C.. MeUors A. J. Bacterid 173 5597-5603(1 991) 
( ZJRawlngs N.D.. Barrett A J. Meth. Enzymol. 248:183-228(1995). 

[1703] 747. SAM. SAM ckxnain (Sterile alpha motif) 

I3]Medlfne: 99101382 The crystal structure of an Eoh receotor ^am Hr^^.r. 

(1706] At least two pratems related to tyroslrase are known to exist in mammals: 

" '^SZ^^ZZ:;^:^'''' "'^ °' S.6-<t^ydrox.indo,e-2^^,^. acid (DHICA) to 

TRP-2 (TYRP2) [6], which is the melanogenic enzyme DOPAchrome tautomeiase (PC s .-i i ^wh^, . . k 

the conversion of DOPAchrome to DHICA trp i>Hi#f»™i ^ wuiomerase (ec 5.3.3.12) that catalyzes 

instead of copper [7]. ^ Vrosinases and TRP-i in that it binds two zincior^ 

11707] Other proteins that belong to this family are: 

- Plants polyphenol oxidases (PPO) (EC 1 10 3 n which catah/yo th^ 

diquinones [8]. « ^w-^ ij which catalyze the oxidation of mono- and o^diphenols to o- 

- CaenoitTabditis elegans hypothetical protein C02C2. 1 

Consensus pattern H-x{4.5)-F-{LIVMFTP]-x-fFWl -H n v^^^ n mi wo\ c ril 

[ IJLerch K. Prog. Clin. Biol. Res. 256:85-98(1988). 

I21Jachman MR. Hajnal A.. Lerch K. Biochem. J. 274:707-7 13(1 991) 

1 3]Linzen B. ftetunwissenschaften 76:206-21 1 (1 989) 

[^S'^K-T^M**!!*^^ '^^^ US-A. 88:244-248(1991) 

[ 5]Kobayashi T.. Urabe K., Winder A.. Jimenez-Cereantes C lmokav« G B««WnM^ t e ■ r- 
Borron J.C., Hearing V.J. EMBO J. 13:S818-S825(1Mr Bcewington T. Solano F.. Garcia- 

[ ejJackson I.J.. Chambers O.M.. Tsukamoto K Cooaland N a niih^„ r, 1 ■ ■ 

11:527-535(1992). ^°P®'^'^ ^.G.. Gilbert D.J.. Jenkms N.A., Hearing V. EMBO J. 

[ 7]Solano F., Martinez-Liarte J.H., Jimenez-Cen/antce r n^^-.^ d_ . « . 

Res. Common. 204:1243-1250(19^! Gancia-Borron J.C., Lozano J.A. Bkxhem. Braphys. 

( SJCary J.W.. Lax A.R.. Flurkey W.H. Plant Mol. Biol. 20:245-253(1992). 

[1710] 749. (Mur Ugase) Folylpolyglutamate synthase signatures 
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- Snake venom metalloproteinases (6). This subfamily mostly groups proteases that act in hemorrhage. Examples 
are: adamalysin II (EC 3.4.24.46). atrolysin C/D (EC 3.4.24.42). atrolysin E (EC 3.4.24.44). fibrolase (EC 3.4.24.72). 
trimerelysin I (EC 3.4.25.52) and II (EC 3.4.25.53). 

Mouse cell surface antigen MS2. 

[1728] Family Ml 3 

Mammalian neprilysin (EC 3.4.24.1 1 ) (neutral endopeptidase) (NEP). 

- Endothelin-converting enzyme l (EC 3.4.24.71 ) (ECE-1 ). which process the precursor of endothelin to release the 
active peptide. 

Kell blood group glycoprotein, a major antigenic protein of erythrocytes. The Kell protein is very probably a zinc 
endopeptidase. 

Peptidase O from Lactococcus lactis (gene pepO). 
[1729] Family M27 

- Clostridial neurotoxins, including tetanus toxin (TeTx) and the various botulinum toxins (BoNT). These toxins are 
zinc proteases that block neurotransmitter release by proteolytic cleavage of synaptic proteins such as synaplo- 
brevins, syntaxin and SNAP-25 [7.8]. 

[1730] Family M30 

Staphylococcus hyicus neutral metalloprotease. 

[1731] Family M32 

- Thermostable carboxypeptidase 1 (EC 3.4.17.19) (carboxypeptidase Taq). an enzyme from Thermus aquaticus 
which is most active at high temperature. 

[1732] Family M34 

- Lethal factor (LF) from Bacillus anthnacis, one of the three proteins composing the anthrax toxin. 
[1733] Family M35 

- Deuterolysin (EC 3.4.24.39) from Penicillium citrinum and related proteases from various species of Aspergillus. 
[1734] Family M36 

Extracellular elastinolytic metalk^proteinases from Aspergillus. 

(173S] From the tertiary structure of thermolysin. the position of the residues acting as zinc Uganda and those involved 
in the catalytk: activity are known. Two of the zinc ligands are histidines which are very close together in the sequence; 
C-terminal to the first histidine is a glutamic acid residue which acts as a nucleophile and promotes the attack of a 
water molecule on the carbonyl carbon of the substrate. A signature pattern which includes the two histidine and the 
glutamic acid residues is sufficient to detect this superfamily of proteins. 
(1 736] Description of pattem(s) and/or profile(s) 

Consensus pattem[GSTALIVN)-x(2)-H-E-[LIVMFYW]-{DEHRKP}-H-x-[UVMFYWGSPQ] [The 
two H's are zinc ligands) [E is the active site residue] 

Sequences known to belong to this class detected by the patternALL, 

except for members of families M5, M7 anrxJ Mil . 

Other sequence(s) detected in SWISS-PFiOTSS; including Neurospora 

crassa conkjiatk^n-specific protein 1 3 which could t>e a 

zinc-protease. 



( IjJongeneel C.V., Bouvier J.. Bairoch A. 

FEBS Lett. 242:211-214(1989). 

[ 21Murphy G.J.P., Murphy G., Reynolds J.J. 
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- Mycolystn (EC 3.4.24.31 ) from Streplomyces cacaol 
[1719] Family M6 



[1720] Family M7 

- Streptomyces extracellular small neutral proteases 
[1721] Family M8 

' ma^r"'''''" ^^"^ 9'ycoprotein gp63). a cell surface protease from various species of Leish- 

[1722] Family M9 

- Microbial collagenase (EC 3.4.24.3) from Clostridium perf ringens and Vibrio alginolyticus. 
[1723] Family MICA 



40 



45 



SO 



ss 



Serralysin (EC 3,4.24.40). an extracellular metaltoprotease from Serratia 
Alkahne metalloproteinase from Pseudomonas aeruginosa (gene aprA) 
Secreted proteases A. B. C and G from Erwinia chiysanthemi. 
Yeast h^othetical protein YILIOBw. 



(1724] Family M10B 



lagenase). MMP-2 (EC 3.4.24.24) (72 Kd gelatrase), MMP-9 (EC 3.4.24 35) (92 Kd aalatina^..^ mmd 7 ,er 
il^lalteir (^^«^^'y^'"-2). and MMP-II (stromelysin-S). MMP-12 (EC 3.4.24.65) (macrophage met- 

Sea urchin hatching enzyme (envelysin) (EC 3.4.24.12), A protease that allows the embryo to diaest the orotective 
envelope denved from the egg extracellular matrix. ^ protective 

Soybean metalloendoproteinase 1 . 



[1725] FamUyMll 

• Chlamydomonas reinhardtii gamete lytic enzyme (GL£) 
[172q Family Ml 2A 

- Astacin (EC 3.4.24.21 ). a crayfish endoprotease. 



Meprin A (EC 3.4.24.18). a mammafian kidney and intestinal brush border metalloendopeptidase 
Bone morphogenic protein 1 (BMP-l). a protein which induces cartilage and bone fonna«on and which ««»r«««. 

pur^uratus Paracentrotus l«r«lus and the related protein SpAN from Strongylocentrotus 



Caenortiabditis elegans protein toh-2. 
Caenorhabditis elegans hypothetical protein F42A10.8 



SS1rwL^«f 3.4.24.67) (also known as emb^ic hatching proteins LCE and HCE) from the fish 
oryzras lapides. These proteases particpates in the breakdown of the egg envelope which is derived fmm thl 
egg extracellular matrix, at the time of hatchino ^ eiope. wnicn is denved from the 



egg extracellular matrix, at the tame of hatching 
[1727] Family M12B 
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laecalis and synA from Synechococcus contain one copy of the HMA domain. The cadmium ATPases cadA from 
Bacillus fimfius and from plasmid pl258 from Staphylococcus aureus also contain a single Hf^A domain, while a 
chromosomal Staphylococcus aureus cadA contains two copies. Other, less characterized ATPases that contain 
the HMA domain are: fixl from Rhizobium meliloti, pacS from Synechococcus strain PCC 7942). Mycobacterium 
leprae ctpA and ctpB and Escherichia coli hypothetical protein yhhO. In all these ATPases the HMA domain(s) are 
located in the N -terminal section. 

- Mercuric reductase (EC 1 . 1 6. 1 . 1 ) (gene mer A) which is generally encoded by plasmids carried by mercury-resistant 
Gram-negative bacteria. Mercuric reductase is a class-1 pyridine nucleotide-disulphrde oxidoreductase (see 
<PDOC00073>), There is generally one HMA domain (with the exception of a chromosomal merA from Bacillus 
strain RC607 which has two) in the N-terminal part of merA. 

• Mercuric transport protein periplasmic component (gene merP). also encoded by plasmids earned by mercury- 
resistant Gram-negative bacteria It seems to be a mercury scavenger that specifically binds to one Hg(2+) ion 
and which passes it to the mercuric reductase via the merT protein. The N-terminal half of merP is a HMA domain. 
Helicobacter pylori copper-binding protein copP. 

- Yeast protein ATX1 [2]. which could act in the transport and/or partitbning of copper. 

[1746] The consensus pattern for HMA spans the complete domain. 
[1 747] Description of pattem(s) and/or profile(s) 

Consensus pattem[U VN]-x(2)-(U VMFA]-x-C-x-[STAGC0NH]-C-x(3)-(U VFG]-x(3)-{LI V]-x(9. 1 1 )-[l VAl-x-f LVFYSl [The 
two C's probably bind metals] 



[ 1]Bull P.C., Cox D.W, Trends Genet. 10:246-252(1994). 

1 2]Lin S.-J.. Culotla V.L Proc. Natl. Acad. Sci. U.S.A. 92:3784-3788(1995). 



[1748] 756. (Peptidase MIC) Matrixins cysteine swrtch 
PROSITE cross-reference(s): CYSTEINE.SWITCH 

Mammalian extracellular matrix metalloproteinases (EC 3.4.24.-). also known as matrixins [1] (see <PE>DC00129>), 
are zinc-dependent enzymes. They are secreted by cells in an inactive form (zymogen) that differs from the mature 
enzyme by the presence of an N-terminal propeptide. A highly conserved octapeptide is found two residues downstream 
of the C-terminal end of the propeptide. This region has been shown to be involved in autoinhibition of matrixins [2.3]; 
a cysteine within the octapeptide chelates the active site zinc ion, thus Inhibiting the enzyme. This region has been 
called the 'cysteine switch' or 'autoinhibitor region*. 
A cysteine switch has been found in the folbwing zinc proteases: 

- MMP-1 (EC 3.4.24.7) (interstitial collagenase). 

- MMP-2 (EC 3.4.24,24) (72 Kd gelatinase). 

- MMP-3 (EC 3.4.24. 1 7) (stromelysin-1 ). 

- MMP-7 (EC 3.4.24.23) (matrilysin). 

- MMP-8 (EC 3.4.24.34) (neutrophil collagenase). 

- MMP-9 (EC 3.4.24.35) (92 Kd gelatinase). 

- MMP-1 0 (EC 3.4.24.22) (strometysin-2). 

- MMP-1 1 (EC 3.4.24.-) (stromelysin-3). 

- MMP-1 2 (EC 3.4.24.65) (macrophage metalloelastase). 

- MMP-1 3 (EC 3.4.24.-) (collagenase 3). 

MMP-1 4 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 1 ). 

- MMP-1 5 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 2). 

- MMP-16 (EC 3,4.24.-) (membrane-type niatrix metalliproteinase 3). 

- Sea urchin hatching enzyme (EC 3.4.24.12) (envelysin) {4]. 

- Chlamydomonas reinhardtii gamete lytic enzyme (GLE) [5]. 

[1 749] Description of pattem(s) and/or profile(s) 

Consensus patternP-R-C.(GN]-x-P-{DR]-lUVSAPKQ] [C chelates the zinc ion] 
[ 1)Woessner J. Jr. FASEB J. 5:2145-2154(1991). 

1 2jSanchez-Lopez R., Nicholson R.. Gesnel M.C.. Matrisian L.M., Breathnach R. J. Biol. Chem 26311892-11899 
(1988). 

( 31Park A.J., Matrisian LM.. Kells A.F., Pearson R.. Yuan 2.. Navre M. J. Biol. Chem. 266:1584-1590(1991) 
( 4JLepage T. Gache C. EMBO J. 9:3003-301 2(1 990). 
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FEBS Lett. 289:4-7(1991). 

( SJBode W.. Grams R. Reinemer P.. Gomls-Rueth R-X.. Baumann U McKav 
D.B.. Stoecker W. ^ 

Zoology 99:237-246(1996), 

[ 4JRawlings N.D.. Barren A. J. 

Meth. Enzymol. 248:183-228(1995). 

[5]Woessner J. Jr. 

FASEB J. 5:21 45-21 54(1 991 ). 

[ ejHite LA.. Fox J.W.. Bjamason J.B. 

1 7]Montecucco C. Schiavo G. 

Trends Biochem, Sci. 18:324-327(1993). 

[ eiNiemann H.. Blasi J,. Jahn R 

Trends Cell Biol. 4:179-185(1994). 

[1737] 751. PseudoU_synt_1 

LTc p\l"? "'^^^^ ^"""^^^^ ^""^^^^ ^ '""^ ^''"^'^ °' pseudouridine at the anticodon stem and loop of transfer- 
RNAs Pseudound.ne « an isomer of uridine (5Kbeta-0-ribofuranosyl) uracil, and id the most abundant^ified nucl- 

a^rt?^^^ '"II?'''^- ^^-^ -^^^ - ed sequence wrthTXTclle^^^^ 

aspartc acid, bkely nvolved cn catalysis. Number of membeis- 25 

Il?fL'^r^^*" ^^^3- '^'^^^ RNA-pseudouridine synthetase Pusi of Saccaromyces cerevisiae contains 
one atom of zinc essential tor its native confomiation and tRNA recognition. Ariuison V. Hountondii C Robert B Gcos- 
jean H; Biochemistry 1 998;37:7268-7276. ' rrauen o. tiros 

[1739] 752. EPSP synthase signatures 

fh!!i!Z!*!l^^ O^Jhosphoshikimate KarboxyvinylUansferase) (EC ^5.1.19) catalyzes the sixth step in the biosyn- 

jTer Jr„?^°^ HH '^^..^T'" Pa'hway) in bacteria (gene aroA). plants and fungi 

(where it is part of a multrfunctional enzyme which catalyzes five consecutive steps in this pathway) [1 EPSP synthai 

{1740] The sequence of EPSP from various biological sources shows that the structure of the enzyme has been well 
conserved throughout evolution. Two conserved regions were selected as signature patterns ThoK Sem ^ 
sponds to a region that is part of the active site and which is also importar^ for the rSst^ to g JplS^^^TgrX 

^craSror^,r:,s:;r ' ^ ^ — - ^ 

(1741] Description of pattem(s) and/or profile(s) 

[1742] Consensus pattem[LIVMl-x(2HGN]-N-{SA]-G-T-[STAJ-x-R-x-lUVMYI-X-rGSTAl 
Consensus pattem[KRl-x4KH]^^CSTlHDNE]fl4UVMWSTAlHUVMC]-x(2HEN]4Uvl^H-x4l^^ 

USI^'k '^ ^'^'I*^ t^ - ^ ^ ■ Leimgruber N.K.. Stegeman RA.. 

f1SZ,« 1' R S^n « ^' "^^^ ® ° """^ Sci. U S A. 88:5O46-5oi0(1 991 ). 

L T 1?!^ S J» Re D.B.. GaserC.a. Eicholte D.A.. FrazierRB.. HironakaC.H/l.. Uvine E.B . Shah D M Fralev 
RT. Kishore G.M. J. Biol. Chem, 266:22364-22369(1 991 ). " ^ 

[1743] 7S3. Glyco_hydro_1 8 

Glycosyl hydrolases family 18. Nun*er of members: 173 

(IJMedline: 95219379. Ciystalstnjctureofabacterjalchitinaseat2.3Aresolution Perrakis A. Tews I DauterZ Oo 
penheim AB. Chet I. WMson KS. >forgias CE; Structure 19942 1169-1180 OauterZ. Op- 

[1744] 754. Esterase 
Putative esterase 

This family contains Esterase D Swiss:Pl0768. However it is not clear if all members of the family have the same 

function. This family is possibly related to the COesterase family. 

Number of members: 36 

[1745] 755. (HMA) Heavy-metal-associated domain 

t^^^ ^ ^""^ ^'=" HI *" a number of proteins that transport or 

detoxify heavy metals. This domain contains two consenred cysteines that could be involvSl in the bintfZSJ 
meials. The domain has been tem«d Heavy-Metal-Associated (HMA). It has been foundh" 

' ^^^^"^"^^ ^^^^ <P0OC001 39». The human copper ATPAses ATP7A 

ST^it^ fh« hSS' "T^ " "^^'^ '^^P^'^ both contain 6 

tandem copies of the HMA domain. The copper ATPases CCC2 from budding yeast. copA from Enterococcus 
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[ 21Sie2en R.J. (In) Proceeding subtiiisin symposium, Hamburg, (1992). 
{ 3JBarr P.J. Cell 66:1-3(1991). 

( 4]Shaulsky G., Kuspa A., Loomis W.F.; Genes Dev. 9:1111-1122(1995). 
[ 5]Rawlings N.D.. Barrett A. J. Meth. Enzymol. 244:19-61(1994). 

[1752] 758. (SSB) Single-strand binding protein family signatures 
PROSITE cross-reference(s): PS00735; SSB_1 .P.S00736; SSB_2 

The Escherichia coli single-strand binding protein [1] (gene ssb). also known as the helix^lestabilizing protein is a 
protein of 1 77 amino acids. It binds tightly, as a homotetramer. to single-stranded DN A (ss-DN A) and plays an important 
role in DNA replication, recombinatbn and repair. 

(1 753] Closely related variants of SSB are encoded in the genome of a variety of large self-transmissible plasmids 
SSB has also been characterized in bacteria such as Proteus mirabilis or Serratia marcescens. 
[1 754] Eukaiyotic mitochondrial proteins that bind ss-DN A and are probably involved in mitochondrial DNA replication 
are structurally and evolutionary related to prokaryotic SSB, Proteins currently known to belong to this subfamily are 
listed bebw [2). ' 

- Mammalian protein Mt-SSB (PI 6). 

- Xenopus Mt-SSBs and Mt-SSBr. 
Drosophila MtSSB. 

Yeast protein RIM1 . 

[1755] Two signature pattems have been developed for these proteins. The first is a conserved region in the N- 
terminal section of the SSB's. The second is a centrally located region whk:h. in Escherichia coli SSB, is known to be 
involved in the binding of DNA. 

[1 756] Description of pattem(s) and/or profile(s) 

Consensus pattern(LI\^F)-[NST]-[KRT]-[LIVM]-x.[LfVMF](2)-G-[NHRKl-{LIVfol]-(GST]-^^^^ 
Consensus pattemT-x-W-[HY]-(RNS]-[LIVfw!]-x-(LIVMF]-[FY]-lNGKR] 

( IJMeyer R.R,. Lalne PS. Microbiol. Rev 54:342-380(1990). 
( 2]Stroumbakis N.D.. U Z., Tolias PP Gene 143:171-177(1994). 

[1757] 759. KDPG and KHG aldolases active site signatures 

PROSITE cross-reference(s): PS00159; ALDOLASE^KDPG^KHG_1 . PS00160; ALDOLASE.KDPG KHG^2 
[1758] 4-hydroxy-2.oxoglutafateaWolase (EC 4.1.3.16) (KHG^Idolase) catalyzes the interconversi^ of 4-hydioxy- 
2-oxoglutarate into pyruvate and glyoxylate. Phospho-2-dehydro-3KJeoxy9!uconate aldolase (EC 4 1 2.14) (KDPG- 
aldolase) catalyzes the interconverston of 6-phospho-2-dehydro-3-deoxy-D.g!uoonate into pyruvate and glycerakJe- 
hyde 3-phosphate. 

[1759] These two enzymes are structurally and functionally related [1]. They are both homotrimeric proteins of ap- 
proximately 220 amino^cid residues. They are class I aldolases whose catalytic mechanism involves the formation 
of a Schiff-base intemiediate between the substrate and the epsilon^amino group of a lysine residue. In both enzymes 
an arginine is required for catalytic activity 

[1 7601 Two signature pattems were devebped for these enzymes. The first one contains the active site arginine and 
the secorrd, the lysine involved in the Schiff-base formation. 
[1761] Description of pattem(s) and/or profile(s) 

Consensus patternG-[LI VM]-x(3)-E4LIV]-T-[LF].R [R is the active site reskiue] Consensus pattemG.x(3WUVMF]-K- 

[LF]-F-P-{SA)-x(3)-G fK is involved in Schiff-base formation] 

[1762] [ 1] Vlahos C J.. Dekker E.E. J. Biol. Chem. 263:11683-11691(1988). 

[1763] 760. AP endonucleases family 1 signatures. PROSITE cross-reference(s) PS00726- 

AP_NUCLEASE_F1_1. PS00727; AP_NUCLEASE_FU2. PS00728 

AP.NUCLEASE_F1^3 

[1764] DNA damaging agents such as the antitumor drugs bleomycin and neocarzinostatin or those that generate 
oxygen radicals produce a variety of lesions in DNA Amongst these is base-loss whkrfi forms apurhic/apyrimkJinfc 
(AP) sites or strand breaks with atypical 31em»ini. DNA repair at the AP sites is initiated by specific endonuclease 
cleavage of the phosphodiester backbone. Such endonucleases are also generally capable of removing blockinq 
groups from the 31ermtnus of DNA strand breaks. 

[1765] AP endonucleases can be classified into two families on the basis of sequence similarity. Family 1 grouos 
the enzymes fisted below [1 J. / » f 
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I SlKinoshlta T.. FuKuza«« H.. t. Safto T.. Ma.s..^ y. p^. ^ ^ 3 ^ 89:4693^97(1 992). 

PrStpT ■ ^''^f^"^ ^^""^ ^^^'l^^* Sites 

PROSITE cross-reference(s); PS001 36. SUBTILASE ASP PSOO1 37- «!i inm. «e,r 

Subtilases(1.2Jare an extensh^e family otserinepiis!^;;^ 

Similar ,0 tha, cf .he Uypsin family o, serine P^^ZlZ^T^'^TtlTr'''"'^''''''''^''^ 
sequence around the residues involved in the cataMic i,^h ^ "^P«"tlent convergent evolution. The 
•rom fha, Of the analogous residue^ me t,S2?«n^?J^ 

category of proteases. proteases and can be used as signatures specific to that 

The subtilase family currently includes the following proteases: 

■ sts^Ltie'^i;!^?;;^'^ 

- Alkafine elastase YaB from Bacillus sp. (gene ale) 

- Alkaline serine exoprotease A from Vibrio alginolyticus (gene proA) 

- Aqualysin I from Thermus aquaticus (gene psti). 

- AspA from Aeromonas satmonicida. 

- Bacillopeptidase F (esterase) from BaciHus subtilis (gene bpf) 

- C5A peptidase from Streptococcus pyogenes (gene scpA) 

- ^r^^*"^^ Lactococcus lactis. 
Extracellular senne protease from Senatia marcescens 

- Extracellular protease from Xanthomonas campestris 

- ntracellular serine protease (ISP) from various Bacillus 

' ««'«'=«"";ar serine protease epr from Bacillus subtilis (gene epr) 

' Jlri^*''""'^' ^ P'"***^ "f" Bacillus subtilis (gene vpr) 

- leader p^tide processing protease nisP from UcfocooL bcS 

' Sr"*",^ ^"^^"^ ' Pasteurella haemolytica (gene ssan' 

- Themiitase (EC 3.4.21 .66) from Them,oactinomyces vulgaris 

- Ca Icium^ependent protease from Anabaena variabilis (gene prcA) 

- Halolysm from halophilic bacteria 8P.172P1 (gene hly) 

- Alteline extracellular protease (AEP) from Yarrowia lipolytica (gene xpi2) 
• P«teinase from Cephalosporium acremoniumlgene alp) ^ ^' 
' (EC3.4.21.48) (vacuolar protease B) from yeast (gene PflBl) 

- CutKle^grading protease (pri) from Metartiizium anLpliae 

- protease from Kluyveromyces lactis 

- Kexin (EC 3.4.21 .61 ) from yeast (gene KEX-2) 

' I tV P''^^^) Aspergillus (gene alp) 

- P^ e-naseK (EC 3.4.21.64) from Tritirachium alburn (gene proK) 

- Proteinase R from Tritirachiumabum (gene proR) 

- Proteinase T from Trifirachium album (gene proT) 

- Subtihsin-Bke protease III from yeast (gene YSP3) 

- Themiomycolin (EC 3.4.21 .65) from Utelbranchea "sulfurea. 

- "^"""(tC 3.4.21.85), neuroendocrine convortasesi to 3 fNEC-i to n^nHDA^.^. . 
vertebrates, and invertebrates. These proteasTare iZv^ n^h^ 

comprised of pairs of basic amino ackJ ^SSue^sr ''"^^"'^ "^""^ ^^'^ ^ 

- JSi??'"^'^ " (•"P^P'-^y' a^inopeptidase) from Human 
Pr^talk-specific proteins tagB and lagC from slime moUMJ Both cZins^^i^ „, . w_ • 
subtilase catalytic domain and a C-tem«nal ABC transporJe/ .ZCe <P^l8^) " 

[1 7S1] Description of pattem(s) and/or profile(s) 

Consensus pattem[STAIVhx4UVMF]-{UVIVl]-DHDSTA]^4UVMFCI-xf2 3WDNHI rn • «, .- 
Consensus PattemH^^STMJ-x-{WC)^STAGCHGS^x^^VlX^^^ rt T'"^ 

'ot l: sSrseTr ris" r ^ ^^^^ ^'^^^^ ^^^^-^ - ■'^^^^ ser^e protease 

Note these proteins betong to family S8 in the classification of peptidases [S.EI]. 

{ IJSiezen R.J.. de Vbs W.H4.. Leunissen J.A.M.. Dijkstra B.W. Protein Eng. 4:719-737(199,). 
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*C*: conserved cysteine involved in a disulfide bond, 
*c': optional cysteine involved in a disulfide bond. 
** ': position of the pattern. 



[1782] The categories of proteins, in which the CTL donnain has been found, are listed below. 

[1 783] Type-n membrane proteins where the CTL domain is located at the G-temiinal extremity of the proteins: 

- Asialoglycoprotein receptors (ASGPR) (also known as hepatic lectins) [4]. The ASGPR's mediate the endocytosis 
of plasma glycoproteins to which the terminal sialic acid residue in their carbohydrate moieties has been removed. 

- Low affinity immunoglobulin epsilon Fc receptor (lymphocyte IgE receptor), which plays an essential role in the 
regulation of IgE production and in the differentiation of B cells. 

- Kupffer cell receptor A receptor with an affinity for galactose and f ucose. that could be involved in endocytosis. 

- A number of proteins expressed on the surface of natural killer T-cells: NKG2. NKR-Pl, YE1/88 (Ly-49), CD69 
and on B-cells: CD72, LyB-2. The CTL- domain in these proteins is distantly related to other CTL-domains; it is 
unclear whether they are likely to bind carbohydrates. 

[1 784] Proteins that consist of an N-temrjinal collagenous domain followed by a CTL- domain {S). these proteins are 
sometimes called 'collectins*: 

- Pulmonary surtaciant-associated protein A(SP-A). SP-A is a calcium-dependent protein that binds to surfactant 
phospholipids and contributes to lower the surface tension at the air-liqukJ interface in the alveoli of the mammalian 
lung. 

Pulmonary surfactant-associated protein D (SP-D). 

- Conglutinin. a calcium-dependent lectin-like protein which binds to a yeast cell wall extract and to immune com- 
plexes through the complement component (iC3b). 

Mannan-binding proteins (MBP) (also known as mannose -binding proteins). 
MBP's bind mannose and N-acetyl-D-glucosamine in a calcium-dependent 
manner. 

Bovine collectin-43 (CL-43). 

[1785] Selectins (or LEC-CAM) [6,7]. Selectins are cell adhesion molecules implicated in the interaction of leukocytes 
with platelets or vascular endothelium. Structurally, selectins consist of a long extracellular domain, followed by a 
transmembrane region and a short cytoplasmic domain. The extracellular domain is itself composed of a CTL-domain, 
followed by an EGF-like domain and a variable number of SCR/Sushi repeats. Known selectins are: 

Lymph node homing receptor (also known as L-selectin, leukocyte adhesion 
molecule-1. (LAM-1), leu-8. gp90-mel, or LECAM-1) 
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• Escherichia coli exonuclease III (EC 3.1 .11.2) (gene xthA). 

- Streptococcus pneumoniae and Bacillus subtilis exonuclease A (gene exoA). 
MammaOan AP endonuclease 1 (API) (EC 4.2.99.18). 

- Drosophila recombination repair protein 1 (gene Rrpi). 

- Arabidopsis thaliana apuiinic endonudease-redox protein (gene aip). 

[1766J Except lor Rrpi and arp. these enzymes are proters of about 300 amino^id residues Rmi anw k„,k 
c^^n additional and unrelated sequences in meir N-.em,inal section (about 4O0^d2i S R^ STaS 
11767] Three signature patterns were developed tor this family of enzymes. The patttmrare iTeS 
consenred regions. The first pattern contains a glutamate which has beenshown 121 k, ttie eS.!^ Z 
to bind a divalent metal ion such as magnesium or manganese ^ ^ Escherch« col. enzyme 

[1768] Consensus patlemlAPF].D4UVMFJ(2)-x-[UVMi-Q^-x-K (E binds a divalent metal icnl 
Consensus pattemCHSTHFY]-n-{KH]-x(7.8)4FYWHSTHFYW](2) 
Consensus patteml^x-G-x-R-{UVMJ-D-{UVMFYHhx-tLV)-x-S 

1 1) Barzilay G., Hickson I.S. BioEssays 17:713-719(1995). 

1 2J Mol CO.. Kuo C.-F.. Thayer M.M.. Cunningham RP., Tainer J.A Nature 374:381-386(1995). 

«™l (ER)Enhancer of rudimentary signature. PROSITE cross<eference(s): PS01 290 ER 
fc !L ^ P"^^'" 'enhancer of rudimentary (gene (e(r)) is a small protein of 1 04 residues whose function 

.8 not yet clear. From an evolutionary point of view, it is highly conserved (H and has been found to exttTor^^^ 
a II mull^ellular eukaryotic organisms. It has been proposed that this protein plays a .ole in me eel c^J? *" '"^'^ 

771 A conserved reg,on in the central part of the protein was selected as a^ signaure paSem 
[1772] Consensus pattern Y-D-HSA]-x-L-(FY]-x-F-{IV)-D-x(3)-tHUVJ-S 

[Jml yrrSpT.nh''^ ' ^ - S I. Gene 186:189-195(1997) 

pZ^EW^hT^ "^"""^ alpha^bunit signature. PROSITE cross-.ef;rence(s): 

Er^?^ electron transfer flavoprotein (ETF) |1.2] serves as a specific electron acceptor for various mitochondrial 
dehydrogenases. ETF transfers electrons to the main respiratory chain via ETF-ubiouinone oxWo^L^LTi^^^ 
m^od^er that consist of an a^ha and a beta subun«'and l«h bind ^S^^T^f^^r S TTiZ 
system also exists in some bacteria ^ aimer, a similar 

Escherichia coli hypothetical protein ydiR. 
Escherichia coli hypothetical protein ygcQ. 

S^e protJir ^^'^ "^^'^ "^'"^^ ^'"^ a signature paUem for 

[1779] Consensus pattern ILIJ-Y-(LIVM)-lA^^-x-G.(IV]-(SDJ-G-x-IIVl.Q-H-x(2^G-x(6)-{IV]-x-A-{IV)-N 

! II t''°T^Z^ ' '"^ ^' "° ^ ■ '^^"^ P'°9- Biol. Res. 321:637-652(1990) 
I isai M.H.. Saier M.H. Jr. Res. Microbiol. 146:397-404(1 995). 

[1780] 763. (lectin c) C-type lectin domain signature and profile 

r''i«?i^/'°^K^'®'^'^^<^'- C-TYPE_LECTIN_1. PS50041: C.TYPE LECTIN 2 

£1781] A number of different families of proteins share a consented domain whl^ was fi»t m ■ 

animanect^s and wh«h seem to function as a cateium-dependent ^S^J^Z^'^^' " 
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than the pattern, you should use it if you have access to the necessary software tools to do so. 

( 1) Drickamer K. J. Biol. Chem. 263:9557-9560(1988). 

[ 2) Drickamer K. Prog. Nucleic Acid Res. Mol. Bkrf. 45:207-232(1993). 

[ 3J Drickamer K. Curr. Opin. Struct. Biol. 3:393-400(1993). 

[ 4) Spiess M. Bk)chemistfy 29:10009-10018(1990). 

( 5] Weis W.I.. Kahn R., Fourme R., Drickamer K.. Hendrickson W.A. Science 254:1608-1615(1991). 

[ 6] Siegelman M. Curr. Biol. 1:125-128(1991). 

[ 7] Lasky LA. Science 238:964-969(1992). 

1 8J Jomori T.. Natori S. J. BioL Chem. 266:13318-13323(1991). 

[ 9] Ng N.F.L, Hew C.-L J. Biol. Chem. 267:16069-16075(1992). 

[1793] 764. (SRCR) Speract receptor repeated domain signature 
PROSITE cross-reference(s): PS00420; SPERACT.RECEPTOR. 

[1794] The receptor for the sea urchin egg peptkJe speract is a transmembrane glycoprotein of 500 amino ackJ 
residues [1). Structurally it consists of a large extracellular domain of 450 resWues, foltowed by a transmembrane 
region and a small cytoplasmic donnain of 12 amino ackis. The extracellular domain contains four repeats of a 115 
amino acWs domain. There are 17 positions that are perfectly conserved in the four repeals, among them are six 
cysteines, six glycines, and three glulamates. 

[1 795] Such a domain is also found, once, in the C-terminal section of mammalian macrophage scavenger receptor 
type I [2J, amembrane glycoproteins implicated in the pathotogic deposition of cholesterol In arterial walls during athero- 
genesis. 

[1796] The signature pattern that was derived spans part of the N-terminal sectkxi of the domain and contains 8 of 
the 17 conserved residues. 

[1 797] Consensus pattemG-x(5)-G-x(2)-E-x{6)-W-G-x(2)-C-x(3)-[FYW]-x(8)-C-x(3)-G 

[ 1] Dangott J.J., Jordan J.E., Beliet R.A., Garbers D.L Proc. Natl. Acad. Sci. U.S.A. 86:2128-2132(1989). 

(2] Freeman M.. Ashkenas J.. Rees D.J., KIngsley D.M,. Copeland N.G., Jenkins N.A.. Krieger 1^ Proc Natl 

Acad. Scl U.S.A. 87:8810-8814(1990). 

[1798] 765, Bac_surface_Ag 
Bacterial surface antigen 

This entry includes the following surface antigens; D15 antigen from H.influenzae. OMA87 from Rmultocida. OMP85 
from N.meningitidis and N.gonorrhoeae. Number of members: 14 

(IJMedline: 95255676. The sequencing of the 80-kDa 015 protective surface antigen of Haemophilus influenzae. 
Flack FS. Loosmore S, Chong P, Thomas WR; Gene 1995;156:97-99. 

[2] f^edline: 96333354. Cloning, sequencing, expression, and protective capacity of the oma87 gene encoding the 
Pasteurella multockla 87-kilodalton outer membrane antigen. Ruffoto CG. Adier B Infect Immun 1996 64 
3161-3167. 



[1799] 766. BRCA1 C Terminus (BRCT) domain 

The BRCT domain is found predominantly in proteins involved in cell cycle checkpoint f unctkjns responsive to DNA 
damage. It has been suggested that the Retinoblastoma protein contains a divergent BRCT domain, this has not been 
included in this family. The BRCT domain of XRCC1 forms a homodimer in the ciystal structure Medline 99016060 
This suggests that pairs of BRCT domains 
associate as homey or heterodimers. Number of members: 131 

[1] l^edline: 96259550. BRCA1 protein products ...Functional motifs... Koonin EV. Altschul SF. Bork P- Nature 
Genet 1996;13:266-268. 

[2] Medline: 97153217. From BRCA1 to RAP1 : A wkJespread BRCT module closely associated with DNA repair 
Callebaut I. Mornon JP; Febs lett 1997;400:25-30. 

{3] Medline: 97186552. A superfamiiy of consented domains in DNA damage responsive cell cycle checkpoint 

proteins Bork P. Hofmann K, Bucher P. Neuwald AF. Altschul SF. Koonin EV; Faseb J 1997;11:68-76. 

[4] Medline: 97402527. Gapped BLAST and PSI -BLAST a new generation of protein database search programs. 

Altschul SF. Madden TL. Schaffer AA, Zhang J. Zhang 2, Miller W. Lipman DJ; Nucleic Acids Ret> 1997 25 

3389-3402. 

[5] Medline: 99016060. Structure of an XRCC1 BRCT domain: a new protein-protein inleractbn module. Zhang 
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- Endothelial leukocyte adhesion molecule 1 (ELAM-1 . E-selectin or LECAM-2). 
11786] The ligand recognized by ELAM-1 » sialyl-Lewis x. 

- Granule membrane protein 140 (GMP.140. P-selectin. PADGEM. CD62. or LECAM- 
3). The ligand recognized by GMP-140 is Lewis x. 

{1787] Large proteoglycans that contain a CTLKJomain folbwed by one copy of a SCR/ ^nchi * • • 
terminal section: ^ oUhv Sushi repeat, m their C- 

- Aggrecan (cartilage-specific proteoglycan core protein) This oroteooh/can moi^ 

Sfcan '"^^ ^^'^ it has a L in i^^^^^''''^ 

Neurocan. 

11789] Two typa-l membrane proteins: f «"»oc»arHiKe repeats. 

' '^T°' '"^ "^"^^Phages. This protein mediates the endocytosis of 

bilk rnoth hemocytin, an humoral lectin whirh ic ir.»,«K,«^ ;^ « ^ " """^""^peaisoiine CTLdonrain. 
domains (see <PDOC0098r>) a CTL selt^efence mechanism. It is composed of 2 FA58C 

domain. 2 VWFC domains (see <PIDOC00928), and a CTCK (see <PDOC00912>). 
[1790] Various other proteins that uniqueV consist of a CTl. domain: 

- Pancreatic stone protein (PSP) (alsTZ^, i^^nTL^ 1^ T"^' mechanisms. 

as an ^ititor o, Sxx,.i^^i£l'^.^^^^ <PTP). or reg). a protein that m^h. act 

- Eos.noph.1 granule major bas.c protein (MBP), a ototoxic prote^ ^ 

- A galactose specific lectin from a rattlesnake. 

■ Two subunits of a coagulation factor IXffactor X-bindinq protein flXOf hn\ =. cn=i,» 

Which binds with factors IX and X in the presence <5 rafcium *^ ant«oagulant protein 

- Two subunits of a phospholipase A2 inhibitor from the plasma of a snake (PU-A and PU m 

• Sea raven antifreeze protein (AFP) (9]. 
three C's are involved in disulfide bonds] 

..n"H-)^-lL.-M»Mj-x(^:)-C-x(5.6)-[FYWUVSTAHUVMSTAJ-C (The 
Note .his documentation entry is linked to both a signature pattern and a profile. As the profile is much more sensit.e 
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zyme that catalyzes the hydrolysis of GDP to GMR 

- Potato apyrase (EC 3.6. 1 ,5) (adenosine diphosphatase) (ADPase). Apyrase acts on both ATP and ADP to produce 
AMP 

- IWIammalian vascular ATP-diphosphohydrolase (EC 3.6. 1 .5) (also known as lymphoid cell activation antigen CD39). 

- Toxoplasma gondii nucleoside-triphosphatases (EC 3.6.1 .15) (NTPase). NTPase hydrolyses various nucleoside 
triphosphates to produce the corresponding nucleoside mono- and diphosphates. This enzyme is secreted into 
the invaded host cell into the parasitophorous vacuole, a specialized compartment where the parasite intracellulary 
resides. 

- Pea nucleoside-triphosphatases (EC 3.6. 1.15) (NTPase). 



• Caenorhabditis elegans hypothetical protein C33H5. 1 4. 

- Caenorhabditis elegans hypothetical protein R07E4.4. 

- Yeast chromosome V hypothetical protein YEROOSw. 



[1 808] The above uncharactertzed proteins all seem to be membrane-bound. 

[1 809] All these proteins share a number of conserved domains. The best conserved of these domains have been 
selected. It is located in the central section of the proteins. 

[1810] Consensus pattem[LIVM)-x-G-x(2)-E-G-x-[FY]-x-[FVV]-[UVA]-(TAGl-x-N-IHY] 
[ 1] Handa M., Guidotti G. Biochem. Biophys. Res. Commun. 218:916-923(1996). 

1 2] Vasconcelos E.G.. Ferreira S.T. de Carvalho TM.U.. de Souza W.. Ketllun A.M.. Mancilla M.. \felenzuela M. 
A.. Verjovski-Almeida S. J. Biol. Chem. 271:22139-22145(1996). 

[1 811] 771 . GTP cyclohydrolase I signatures 

PROSITE cross-reference(s); GTP_CYCLOHYDROL^1_l. GTP_CYCLOHYDROL^1_2 GTP cyctohydrolase I (EC 
3.5,4.16) catalyzes the biosynthesis of formic acid and dihydroneopterin triphosphate from GTP. This reaction is the 
first step in the bk>synthesis of tetrahydrofolate In prokaryotes. of tetrahydrobiopterin In vertebrates, and of pteridine- 
containing pigments in insects. 

[1812] GTP cyclohydrolase I is a protein of from 190 to 250 amino acid residues. The comparison of the sequence 
of the enzyme from bacterial and eukaryolic sources shows that the structure of this enzyme has been extremely well 
conserved throughout evolution [1]. 

[1 81 3] Two conserved regions were selected as signature patterns. The first contains a perfectly conserved tetrapep- 
tide which is part of the GTP-binding pocket [2], the second regk>n also contains consented residues involved in GTP- 
binding. 

[1814] Consensus pattem(DEN]-ILIVM](2)-x(2)-[KRNQHDEN]-(LIVMJ-x(3)-[ST]-x-C-E- H-H 
Consensus pattern[SAl-x-[RK]-x-Q-[LlVM]-Q-E-lRNHU]-[TSN] 

[ 1] Maier J., Witter K.. Guetlich M.. Ziegler I.. Werner T. Ninnemann H. Biochem. Biophys. Res. Commun 212' 
705-711(1995). 

( 2] Nar H., Huber R.. Meining W.. Schmid C. Weinkauf S.. Bacher A. Structure 3:459-466(1995). 
[1815] 772. llvC. Acetohydroxy acid isomeroreductase 

Acetohydroxy acid isomeroreductase catalyses the conversion of acetohydroxy ackjs into dihydroxy valerates. This 
reactk>n is the second in the synthetic pathway of the essential branched side chain amino acids valine and isoleucine. 
Number of members: 29 

[1816] [1] Medline: 97361822. The crystal structure of plant acetohydroxy acid isomeroreductase complexed with 
NADPH, two magnesium tons and a herbicidal transition state analog determined at 1 .65 A resolution. Biou V. Dumas 
R. Cohen-Addad C, Douce R. Job D. Pebay-Peyroula E; EMBO J 1997;16:3405-3415. 
[1817] 773. Prokaryotic membrane lipoprotein lipid attachment site 
PROSITE cross-reference(s); PROKAR_LlPOPROTElN 

In prokaryoles. membrane lipoproteins are synthesized with a precursor signal peptide, which is cleaved by a specific 
lipoprotein signal peptidase (signal peptidase II). The peptidase recognizes a conserved sequence and cuts upstream 
of a cysteine residue to which a glyceride-fatty acid lipid is attached [1]. Some of the proteins known to undergo such 
processing currently include (for recent listings see (1.2,3]): 



Major outer membrane lipoprotein (murein-lipoproteins) (gene Ipp). 
Escherichia coli lipoprotein-28 (gene nlpA). 
Escherichia coli llpoprotein-34 (gene nIpB). 
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X Moreras. BatesPA. W,fteheadPC. Co^er/.. Ha^bucherK. ftesh sm^^^^ 
[1800] 767. Kappa caseh 

[1802] 768. Chrtinases family 18 active site «oo^i«^2. 
PROSITE cross-reference(s) CHITINASE_18 

a) Chitinases from: 

■ S^Tl' ^ Atteromonas. BacMlus. Seiratia. Streplomyces. etc 
Plants such as Arabidopsis. cucumber, bean, tobacco etc 

' '^P'^'«*"'"'f*'^oP"s.Saccharom^^^^ 

- Nematode (Brugia malayi). 

Insects (Manduca sexta). 

- Baculovifuses (Autographa Califomica Nuclear Polyhedrosis virus). 

b) Other proteins: 

- Hevamine. a rubber tree protein with chitinase and lysozyme activities 

- Muyveromyces lactis killer toxin alpha subunit. which acts as a chitinase 

- Ftevobactenum and Streplomyces enAM,eta-N-acetylglucosaminidSS(EC 3 2 1 96^ 

the best conserved region in these proteins ^ ' S'u'^a » at the extremity ot 

11804] consensus f>attem(UVMFYHDNJ^UVMFHDNHLIVMFHDN]-x-E [E is the active site residuej 

1 11 Flach J.. Pilet P.-E.. Jones R Experientia 48:701-716(1992) 
12] Honnssat B. Biochem. J. 280:309-316(1991) 

1 3I^e T.. Kohori K.. Miyashita K.. Pujii T.. Sakai H.. Uchida M.. TanaKa K J. Bid. Chent 268:18567-18572 
1 4) Parrakis Tews I.. Dauter Oppenheim AB.. Chet ... WBson K.S.. Vorg^s C.E. Structure 2:1169-1180 
tSJ van Schellinga A.C.T.. Kalk K.H.. Beintema J. J.. DijkstraB.W. Structure 2:1181-1189(1994). 
[1805] 769. gagjjl 7. gag gene protein pi 7 (matrix protein) 

v^nJsXrrS mlXs'l sT""^' '^"^"^ ^ ^'^ ^ — hnmunodef^iency 

.erisirrSiaSrM^^^^^^ f ^ «.munodeficiency virus type 1 matrix pro- 

PHOSITE cross-reference(s); GDA1 _CD39_NTPASE 
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[1 822] Class-ll tRN A synthetases do not share a high degree of similarity, however at least three consen/ed regions 

are present [2.5,8]. Signature patterns from two of these regions have been derived. 

[1823] Consensus pattem[FrH]-R-x-(DE]-x(4.1 2)-[RH]-x(3)-F-x(3HDEl 

Consensus patternIGSTALVFl-{DENQHRKPHGSTAHUVMF)r[DE]-R-IUVMFJ-x4LIVM 

( 1]Schimmel P. Anna Rev, Biochem. 56:125-158(1987). 
[ 2]Delarue M., Moras D. BioEssays 15:675-687(1993). 
[ 3]Schimmel P. Trends Biochem. Sci. 16:1-3(1991). 

1 4]Nagel G.M.. Doolittle R.F. Proc. Natl. Acad Sci. U.S.A. 88:8121-8125(1991). 
1 5]Cusack S., Haertlein M,. Leberman R, Nucleic Acids Res. 19:3489-3498(1991), 
( ejCusacIc S. Biochimie 75:1077-1081(1993). 

1 7JCusack S.. Berthet-Colominas C, Haertlein M.. Nassar N., Leberman R. Nature 347:249-255(1990). 
[ 81Leveque F,. Plateau P. Dessen P, Blanquet S. Nucleic Acids Res, 18:305-312(1990). 

[1824] 775. X Trans-activation protein X 

This protein Is found in hepadnaviruses where it is indispensable for replication. Number of members: 91 
[1825] 776. Thymidylate synthase active site 

[1826] Thymidylate synthase (EC 2. 1 .1 .45) [1 .2) catalyzes the reductive methylation of dUMP to dTMP with con- 
comitant conversion of 5,10-methylenetetrahydrofolate to dihydrofolate. Thymidylate synthase plays an essential role 
in DNA synthesis and is an important target for certain chenwtherapeutic drugs, 

[1827] Thymidylate synthase is an enzyme of about 30 to 35 Kd in most species except in protozoan and plants 
where it exists as a bifunctlonal enzyme that includes a dihydrofolate reductase domain. 

[1828] A cysteine residue is involved in the catalytic mechanism (it covalently binds the 5,6-dihydro-dUK/IP interme- 
diate). The sequence around the active site of this enzyme is consen/ed from phages to vertebrates. 

[1829] Consensus pattern R.x(2)-ILIVM]-x(3)-(FW]-IQN]-x(8,9)-[LV]-x-P-C.[HAVM].x(3)4QMT]-IFYWl.x-fL^ [C is 
the active site residue] 

{ 1] Benkovic S,J. Annu. Rev, Biochem. 49:227-251(1980). 

[ 2] Ross P., O'Gara F.. Condon S. Appl. Environ. Microbiol. 56:2156-2163(1990). 

[1830] 777. Glycosyl hydrolases family 31 signatures 

[1831] It has been shown (1.2,3,E1] that the following glycosyl hydrolases can be. on the basis of sequence similar- 
ities, classified into a single family: 

- Lysosomal alpha-glucosidase (EC 3.2.1.20) (acid maltase) is a vertebrate glycosidase active at tow pH. which 
hydrolyzes alpha(1 ->4) and alpha(1 ->6) linkages in glycogen, maltose, and isomaltose. 

- Alpha-glucosidase (EC 3.2. 1 .20) from the yeast Candida tsukunbaensis. 

- Alpha-glucosidase (EC 3.2, 1 .20) (gene malA) from the archebacteria Sulfolobus solfataricus 

. Intestinal sucrase-isomaltase (EC 3.2. 1 .48/ EC 3.2. 1 . 1 0) Is a vertebrate membrane-bound, multifunctional enzyme 
complex whk:h hydrolyzes sucrose, maltose and isomaltose. The sucnase and isomaltase domains of the enzyme 
are homologous (41% of amino acid identity) and have most probably evolved by duplication. 

- Glucoamylase 1 (EC 3.2. 1 .3) (glucan 1 ,4-alpha-glucosidase) from vartous fungal species. 

- Yeast hypothetical protein YBR229c. 

Fission yeast hypothetical protein SpAC300l 1 .01c. 

[1 832] An aspartic acid has been implicated [4] in the catalytic activity of sucrase. isomaltase. and lysosomal alpha- 

glucosktese. The region around this active residue is highly conserved and can be used as a signature pattern. A 

second region, which contains two conserved cysteines, has been used as an addittonal signature pattem, 

[1 833] Consensus pattem (GFj-[LI VMF]-W-x-P-M-lNSA]-E [D is the active site residue] 

Consensus pattem G-[AV]-D-IUVMTA]-C-G-IFY]-x(3)-(ST]-x(3)-L-C-x-R-W-x(2)-[LV]-(GSAJ-[SA]-F x-P-F-x-R-^^ 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

I 2] Kinsella B.T, Hogan S.. Larkin A., Cantwell B.A. Eur. J. Biochem. 202:657-664(1991). 

( 3] Naim H.Y. Niermann T, Kleinhans U.. Hollenberg CP. Strasser A.W.M. FEBS Lett 294:109-112(1991). 

[ 4] Hermans M.M.P. Kroos M.A.. van Beeumen J.. Ctostra B.A.. Reuser A.J.J. J. Biol. Chem 266 13507-13512 

(1991). 



[1834] 778. Urease signatures 
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- Escherichia coli lipoprotein n(pC. 

- Escherichia cdi lipoprotein nlpO. 

' Escherichia coli osnrwticaily inducible lipoprotein B (gene osmB) 

- Eschencha coli osmotically inducible lipc^rotein E (gene osmE) 

- Eschencha coli peptidoglycan^ssociated lipoprotein (gene pal) 

- Eschenchia coli rare lipoproteins A and 8 (genes rplA and rplB) " 

- Eschencha coli copper homeostasis protein cutF (or nIpE) 

- Escherichia coli plasmids traT proteins. 

- Escherichia coli Col plasmids lysis proteins. 

- A number of Bacillus beta-lactamases. 

' ^"""^r^ P^"P*^^« oligopeptide^inding protein (gene oppA) 

- Borrela burgdorferi outer surface proteins A and 8 (genes ospA and ospB) 

' ^^^^^^^ (9enevmp21)and7(genevmp7) 

■ e^»^y^««ra*^homatis outer membrane protein 3 ^ 

■ '^»^«>acter succinogenes endoglucanase cel-3. 

- Haemophilus influenzae proteins Pal and Pep. 
Klebsiella pullulunase (gene pulA). 

- Klebsiella pullulunase secretion protein pulS. 

- Mycoplasma hyorhinis protein p37, 

^0 - IJ^coplasma hyorhinis variant surface antigens A. B. and C (genes 

■ Neissena outer membrane protein H.a vi^mo^- 

- Pseudomonas aeruginosa lipopeptide (gene IppL). 

- Pseudomonas solanaceanjm endoglucanase egl 

- SWgellaflexneriinvasfonplasmid proteins mxU and mxiM 

- ®feptocoocus pneumoniae oligopeptide transport protein A (gene amiA) 
• Treponema pallidium 34 Kd antigen. igeneamw). 

- Treponema pallidium membrane protein A (gene tmpA) 
=w - Vibrio han^eyi chitobiase (gene chb). 

- Yersinia virulence plasmid protein yscJ. 

^ • a?cSr;erp:rs:^:£™-^^^^^ 

fdenZ ^^Z~:^r^i£,^^ '^'^^ ^ — and a set ot rules to 

- be at least one Lys or on'e Arg ^ ^Z^ZTS^J^^Z', """"" ''^'^ 

®- ^ Biomembr 22:451-471(1990) 

21Klein P. Somorjai R.L. Lau P.C.K. Protein Eng. 2:15-20(1988) " 
[ 3Jvon Heiine G. Protein Eng. 2:531 -534(1 989) 
' [;i^t.ar S.. Scharf B.. Kent S.B.H.. Rodewa« K.. Oestert,e.t D.. EngeU^rd M. J. B^i. Chem. 269:14939-14945 

SSSi J^"*" ^"<»cyi-transfer RNA synthetases class-II signatures 

' 6.m' ^^ar Am^cyt-tRNA synthetases (EC 
«rst step in pro.ei^ ^'"sy^^eTi^^^^^ as\he 
synthetases, one for each different amino «riH in I. V T different types of aminoacyl-tRN A 
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[1844J Structurally, all known receptor PTPases, are made up of a variable length extracellular domain, followed by 
a transmembrane region and a C-temrilnal catalytic cytoplasmic donnaln. Some of the receptor PTPases contain fi- 
bronectin type III (FN-III) repeals, immunoglobulln-like domains. hAAM domains or caibonK anhydrase-Kke domains 
in their extracellular region. The cytoplasmic regkxi generally, contains two copies of the PTPAse domain. The first 
seems to have enzymatic activity, while the second is inactive but seems to affect substrate specificity of the first. In 
these donrains, the catalytic cysteine is generally conserved but some other, presumably important, reskiues are not. 
[1845] In the following table, the domain structure of known receptor PTPases is shown: 



Extracellular 


1 Intracellular 




tg FN-3 CAH MAM PTPase 


Leukocyte common antigen (LCA) (CD45) 


0 


2 


0 


0 


2 


Leukocyte antigen related (LAR) 


3 


8 


0 


0 


2 


Orosophila DLAR 


3 


9 


0 


0 


2 


Drosophila DPTP 


2 


2 


0 


0 


2 


PTP-alpha (LRP) 


0 


0 


0 


0 


2 


PTP-beta 


0 


16 


0 


0 


1 


PTP-gamma 


0 


1 


1 


0 


2 


PTP-delta 


0 


>7 


0 


0 


2 


PTP-epsilon 


0 


0 


0 


0 


2 


PTP-kappa 


1 


4 


0 


1 


2 


PTP-mu 


1 


4 


0 


1 


2 


PTP-zeta 


0 


1 


1 


0 


2 



[1 846] PTPase domains consist of about 300 amino acids. There are two conserved cysteines, the second one has 
been shown to be absolutely required for activity Furthemnore. a number of conserved residues in its immediate vicinity 
have also been shown to be important. 

[1847] A signature pattern was derived for PTPase domains centered on the active site cysteine. 

[1848] There are three profiles for PTPases. the first one spans the complete domain and is not specific to any 

subtype. The second profile is specific to dual-specificity PTPases and the third one to the PTP subfamily 

[1849] Consensus pattem ILIVMFl-H-C-x(2)-G.x(3)-[STC]-{STAGP]-x.[LIVMFY) [C is the active site residuej 

[1850] Notethe M-phase inducer phosphatases (cdc25-type phosphatase) are tyrosine-protein phosphatases that 

are not structurally related to the above PTPases. 

[1851] Notethis documentation entry is linked to both a signature pattem and to profiles. As profiles are much more 
sensitive than the pattern, you shouW use them if you have access to the necessary software tools to do so. 

( 1] Fischer E.H., Charbonneau H.. Tonks N.K. Science 253:401-406(1991). 
( 2J Charbonneau H., Tonks N.K. Annu. Rev Cell Bbl. 8:463-493(1 i992). 
( 3] Trowbridge LS. J. Biol. Chem. 266:23517-23520(1991). 
[ 4] Tonks N.K.. Charbonneau H. Trends Biochem. Sci. 14:497-500(1989). 
[ 5] Hunter T Cell 58:1013-1016(1989). 

(1 852] 780. Connexins signatures 

[1853] Gap junctions [1] are specialized regions of the plasma membrane which consist of closely packed pairs of 
transmembrane channels, the connexons. through which small molecules diffuse from a cell to a neighboring cell. Each 
connexon is composed of an hexamer of an integral membrane protein which is often referred to as connexin In a 
given species there are a number of different, yet structurally related, tissue specific, forms of connexins. The types 
of connexins which are currently known are listed below 

Connexin 56 (Cx56). 

Connexin 50 (Cx50) (fens fiber protein MP70). 

- Connexin 46 (Cx46) (alpha-3). 

- Connexin 45 (Cx45) (alpha-6). 

- Connexin 43 (Cx43) (alpha-1 ). 

- Connexin 40 (Cx40) (alpha-5). 

- Connexin 38 (Cx38) (alpha-2). 
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I IJ Takishima K., Suga T, Mamiya G. Eur. J. Biochein. 175:151-165(1988) 

1 2] Mobley H.LT.. Husinger R.P. Microbiol Rev. 53:85-108(1989). 

1 31 Jabri E.. Cair M.B.. Hausinger R.P.. Karplus P.A. Science 268:998-1004(1995). 

Tyrosine specific protein phosphatases signature and profiles 

I1841J Soluble PTPases. 

- PTPN1 (PTP.IB). 

- PTPN2 (T-cell PTPase; TC-PTP) 

PTPN8 fzoz^EP) Protein-lyrosine phosphatase; HePTP). suogroup. 

- PTPN9 (MEG2). 

- PTPN1 2 (PTP^ 1 : P7P.P1 9). 

- Yeast PTP1 . 

Yeast PTP2 which may be involved in the ubiqultin-mediated protein degradation pathway 

- Fissron yeast pypi and pyp2 which play a role in inhibiting the onset of LosT ^' 
• Fisston yeast pyp3 which contributes to the dephosphorylation of cdc2 

- Yeast CDC 1 4 which may be involved in chromosome segregation 

- Yersinia virulence plasmid PTPAses (gene yopH). 

- Autographa califomica nuclear polyhedrosis virus 1 9 Kd PTPase. 

[1842] Dual specificity PTPases. 

■ Z'Z-'r''- """^ P^P'^tase-l; MKP-„; wh^ Cephospho^iates MAP Kinase on both Thr-183 

^^r"^''^ " dephosphoo.la.es MAP kinases ERK1 and ERK2 on both Thr and Tyr 

- DUSP3 (VHR). 

- DUSP4 (HVH2). 

- DUSP5 (HVH3). 

- DUSP6(Pyst1: MKP-3). 

- DUSP7 (Pyst2; MKP-X). 

- Yeast MSGS, a PTPase that dephosphorylates l\^AP kinase FUSS 

- Yeast YVHl. 

- Vaccinia virus Hi PTPase; a dual specificity phosphatase. 
[1 843] Receptor PTPases. 
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(I860] it has been proposed that this hexapeptide sequence is responsible for a post-translational modification nec- 
essary for the proper anchoring of the proteins which bear it, to the cell wall 
Proteins known to contain such hexapeptide are listed below: 

- Aggregation substance from streptococcus faecatis (asal ). 
C5a peptidase from Streptococcus pyogenes (scpA). 

C proteh alpha-antigen from Streptococcus agalactiae (bca). 

- Cell surface antigen l/ll (PAC) from Streptococcus mutans. 
Dextranase from Streptococcus downei (dex), 

- Fibronectin-binding protein from Staphylococcus aureus (fnbA), 
Fimbrial subunits from Actinomyces riaeslundii and viscosus. 
IgA binding protein from Streptococcus pyogenes (arp4). 

IgA binding protein (B antigen) from Streptococcus agalactiae (bag). 
IgG binding proteins from Streptococci and Staphylococcus aureus. 
Intemalin A from Listeria monocytogenes (inIA). 
M proteins from streptococci. 

Muramidase-released protein from Streptococcus suis (mrp). 

- Nisin leader peptide processing protease from Lactococcus lactis (nisP). 
Protein A from Staphylococcus aureus. 

Trypsin-resistant surface T protein from streptococci. 
Wall-associated protein from Streptococcus mutans (wapA). 
Wall-associated serine proteinases from Lactococcus lactis. 

[1861] Consensus pattemL-P-x-T-G-[STGAVDE] 

[1862] [ 1] Schneewind O., Jones K.F.. Fischetti VA. J. Bacteriol. 172:3310-3317(1990). 
[1863] 782. Gamma-glutamyltranspeptidase signature 

[1864] Gamma-glutamyltranspeptidase (EC 2.3,2.2) (GGT) II] catalyzes the transfer of the gamma-glutamyl moiety 
of glutathione to an acceptor that may be an amino acid, a peptide or water (forming glutamate). GGT plays a key role 
in the gamma-glutamyl cycle, a pathway for the synthesis and degradatkxi of glutathione. In prokaryotes and eukary- 
otes, it is an enzyme that consists of two polypeptide chains, a heavy and a light subunit. processed from a single 
chain precursor. The active site of GGT is known to be located in the light subunit. 

[1865] The sequences of mammalian and bacterial GGT show a number of regions of high similarity [2). Pseu- 
domonas cephalosporin acylases (EC 3.5.1.-) that convert 7-beta-(4K:arboxybutanamido)-cephalosporank: ackJ (GL- 
7ACA) into 7raminocephatosporanic acid (7ACA) and glutark: ackJ are evolutionary related to GGT and also show 
some GGT activity (3]. Like GGT. these GL-7ACA acylases. are also composed of two subunits. 
[1866] One of the conserved regions conespond to the Nl-temiinal extremity of the mature light chains of these 
enzymes. This region has been used as a signature pattern. 

[1867] Consensus pattemT-{STAl-H-x-[ST].ILIVMA]-x(4)-G-[SN]-x-V-[STA]-x-T-x-T-IUVH^]-[NEl-x(l ,2)-[FYl-G 

1 1] Tate S.S.. I^eister A. Meth. Enzymol. 113:400-419(1985). 

{ 2J Suzuki H.. Kumagai H.. Echigo T. Tochikura T J. Bacteriol. 171:5169-5172(1989). 

[ 3] Ishiye M., Niwa M. Biochim. Biophys. Acta 1132:233-239(1992). 

[1868] 783. Ferrochelatase signature 

[1 869] Ferrochelatase (EC 4.99. 1 . 1 ) (protoheme ferro-lyase) [1 ,2] catalyzes the last step in heme biosynthesis: the 
chelatk)n of a ferrous ion to proto-porphyrin IX. to form protoheme. 

[1870] In eukaryotes, ferrochelatase is a mitochondrial protein bound to the inner membrane, whose active site faces 
the mitochondrial matrix. The mature form of eukaryotic ferrochelatase is composed of about 360 amino acids. In 
bacteria, ferrochelatase (gene hemH) (3) is a protein of from 310 to 380 amino ackJs. 

[1 871] The human autosomal dominant disease protoporphyria is due to the reduced activity of ferrochelatase. 
[1 872] The signature pattern for this enzyme is based on a consented regkai which contains a histidine residue which 
could be involved in binding iron. 

[1873] Consensus pattem[LIVMF](2)-x-{ST]-x-H-(GSl-(UVM]-P-x{4,5)-[DENQKR]-x-G-(DP]-x(1.2)-Y 

( 1] Labbe-Bois R. J. Biol. Chem. 265:7278-7283(1990). 

( 2] Brenner D.A.. Frasier F. Proc. Natl. Acad. Sci. U.S.A. 88:849 653(1 991 ). 

1 3| Miyamoto K.. Nakahigashi K.. Nishimura K.. Inokuchi H. J, Mol. Biol. 219:393-398(1991). 
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- Connexin 37 (Cx37) (alpha-4). 

- Connexin 33 (Cx33) (aIpha-7). 

- Connexin 32 (Cx32) (beta-1 ). 

- Connexin 31 . 1 (Cx3l . 1 ) (beta-4). 

- Connexin 31 (Cx3l) {beta-3). 

- Connexin 30.3 (Cx30.3) (beta-5). 

- Connexin 26 (Cx26) (beta-2). 

isvariabieaiomSOreskiuesTcSe^^^ 

below. '^^'^"^ ^® schematic representation of this stmcture is shown 
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I18S61 consensus panemC^DNl^xS^^^^ 

sensus PattemC-x(3.4).P-C-x(3HL VMHDeS^^^^^^ ^ '"T"^ ''^""'"^ '^'^^^ 

bonds] " =™n'-l'-TJiuvMH5AHKRJ-P (The three C's are involved in disulfide 

S Hi D-^- Go'iger J.A., Paul D.L Annu. Rev. Biochem 65 475-502n996i 

n£ ?H ^"rtace P-ot^ns -anchoring' hexapeptide ' 

This structure is represented . the .Z^ ' ^"^^^ °' ^ ^ J- 



^ 

I Variable length extracellular domain JH| Anchor |B| 

55 H': conserved hexapeptide, 

B': cluster of basic residues. 
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dimer. a by-product of nylon manufacture [4]. 

- Glutamy»-tRNA(Gln) amidotiansferase subunit A (5J. 
Mammalian fatty acid amide hydrolase (gene FAAH) [6]. 
A putative amidase from yeast (gene AMD2). 

- Mycobacterium tuberculosis putative amidases amiA2, amiB2. amiC and amiD. 

[1881] All these enzymes contain in theircentral section a highly consen/ed region rich in glycine, serine, and alanine 
residues. This region has been used as a signature pattern. 

Consensus pattern: G-[GAl-S-[GS]^GS]-G-x-[GSA]-{GSA\nr]-x^UVM]-[GSA]-x(6)^GSATl-x-^GA^x-fDE^x^ 
[UVM]-R-x-P-[GSAC] I J I J -I J 

[ 1] Mayaux J.-R. Cerbelaud E., Soubrier R, Faucher D.. Petre D. J. Bacteriol. 172:6764-6773(1990). 

[ 2) Hashimoto Y, Nishiyama M.. Ikehata O. . Horinouchi S., Beppu T Biochim. Biophys, Acta 1 088:225-233(1 991 ). 

[ 3] Chang T.-H.. Abelson J. Nucleic Acids Res. 1 8:7180-7180(1 990). 

1 4] Tsuchiya K.. Fukuyama S., Kanzaki N., Kanagawa K.. Negoro S.. Okada H. J. Bacteriol. 171:3187-3191(1989) 
1 51 Curnow A.W.. Hong K.W,. Yuan R.. Kim S.I.. Martins O.. Winkler W., Henkin TM., Soil D Proc Natl Acad 
Sci. U.S.A. 94:11819-11826(1997). 

[ 6] Cravatt B.R, Giang O.K., MayfieM S.P, Boger D.L. Lemer R.A., Gilula N.B. Nature 384:83-87(1996). 
[1882] 786. Glycosyl hydrolases family 10 active site 

[1 883] The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases 
(EC 3.2.1.4). celk)bk>hydrolases (EC 3.2.1.91) (exoglucanases). or xylanases (EC 3.2.1.8) [1.2]. Fungi and bacteria 
produces a spectrum of cellutolytic enzymes (cellulases) and xylanases which, on the basis of sequence similarities, 
can be classified into families. One of these families is known as the cellulase family F [3] or as the glycosyl hydrolases 
family 10 [4, El]. The enzymes which are currently known to belong to this family are listed below 

- Aspergillus awamori xylanase A (xynA). 

- Bacillus sp. strain 1 25 xylanase (xynA). 
Bacillus stearothernnophllus xylanase. 

Butyrivibrio fibrisolvens xylanases A (xynA) and B (xynB). 

- Caldocellum saccharolyticum bifunctional endoglucanase/exoglucanase (celB). This protein consists of two do- 
mains; it is the N-terminal domain, which has exoglucanase activity, which betongs to this family. 
Caldocellum saccharolyticum xylanase A (xynA), 

- Caldocellum saccharolyticum ORF4. This hypothetk^l protein is encoded in the xynABC operon and is probably 
a xylanase. 

Cellutomonas fimi exoglucanase/xylanase (cex). 
Clostridium stercorarium themiostable celloxylanase. 

- Clostridium thermocellum xylanases Y (xynY) and Z (xynZ). 
Cryptococcus albidus xylanase. 

Penicillium chrysogenum xylanase (gene xylP). 
Pseudomonas fluorescens xylanases A (xynA) and B (xynB). 

- Ruminococcus flavefaclens bifunctksnal xylanase XYLA (xynA). This protein consists of three domains: a N-ter- 
minal xylanase catalytic domain that belongs to family 11 of glycosyl hydrolases; a central domain composed of 
short repeats of Gin. Asn an Trp. and a C-terminal xylanase catalytic domain that belongs to family 10 of glycosyl 
hydrolases. 

Streptomyces livkdans xylanase A (xInA). 
Thermoanaerobacter saccharolyticum endoxylanase A (xynA). 
Thermoascus aurantlacus xylanase. 

- Thermophilc bacterium Rt8.B4 xylanase (xynA). 

[1884] One of the consen/ed regions in these enzymes is centered on a conserved glutamk: acid residue which has 
been shown {5], in the exoglucanase from Cellutomonas fimi. to be directly involved in glycoskJic bond cleavage by 
acting as a nucleophile. This regton has been used as a signature pattern. 

[1885] Consensus pattem[GTA]-x(2)-(LIVNl-x-IIVMFl-|ST].E-lLIY]-IDNJ.ILIVMFJ [E is the active site residue] 
[ 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990). 

[ 2] Gilkes N.R.. Henrissat B.. Kilbum D.G.. Miller R.C. Jr., Warren R.A.J. Microbiol. Rev. 55:303-315(1991). 
1 3] Henrissat B.. Claeyssens M.. Tomme P. Lemesle L.. Mornon J.-P Gene 81:83-95(1989). 



EP1 033405 A2 



??*-^''"'««*"««"9*™i". bacterial type 

EC ?^1^^rKSySS~^-^^^ 

[187q StructuralV. cellulases and >o,bnfLl „ "'Vtenases (EC a2.1.8) (11 ^ 

12J. B^as Known to'S^'sSiTSir;^" '"^ '''^ '° ^^^<^ ac« «s«ues 
f endl ) f rom Butyrivibrio fibrisolvens 

Endog ucanase A (gene celA) l«xn Microbispora bisSra 

- i^trjis^e^^ts^^ 

- OMn^e 63 (EC 3.2.1.!4, f SSSTp^R^r """'^ 

- ChitinaseCfromStreptomyceslividans. 

. ?e .S'^'el^ir r^ra:?; " "^^ "-^'-^ --"-•V these enzy.es As « is 
extremity of the domain - which havS be^ S^TsTtol ^ '^^"^ ^BD domai?^^ aTei^ 

..Ptophan rescues eou« be ^^illS^^tr^rcsSXT^ ^ '^^ 



xCxxxxWxxxxxNxxxWxx«cxxxWxxxxxxxxWNxxxxxGxxxxxxxx«^ 



C: conseived cysteine involved in a disulfide bond. position of the patten,. 



consensus Pa«emW-N-,STA0BHST0NHUVM,.x(2HGST,.x-,3S«^ (UVM^TJ-IOAJ 

31 Gilkes N.R.. Claeyssens M.. AeteJ^TnenriaZ Z^^^^^ 4:349-353(1 99,) 

J.. Miller R.C. Jr. Eur. J. Biochem. 202:367-377(1 Mi^ - barren R.A. 

[1879] 785. Amidases signature 

uiiunary related. These enzymes are listed below 

Indoieacetamide hydrolase fEC 3 5 i ^ k 

dole-3^.amide (lAM) into indole-3-aceit?(IMT'tSe J^^^ '''' '^'^'^"^ •^V^^oly^'s o. in- 

• Acetamidase from Emericella nidolans Se a^l^ *" ""^ "^^synthesis o, auxins from tryptopteJ 

carbon or nitrogen source. ^ ^ ^ "^^^^ allows acetamide to be used i a 

Amidase (EC 3.5.1.4) trom Rhodococcos so H77a ^ o 

hydrolyzes propi««mides efficiently, and Z a^Ji^^enl^lZTT I' "^'^ ™8 enzyme 

Am«tese (EC 3.5.1.4) from Pseudomonas chlororlphr ^' ^"'^'^ ^ indoieacetamide. 

b-ammohexanoate-cyclic-dimer hydrolase fEC -T «; p i qw„ .■ 
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1 4] Rawlings N.D.. Barrett A. J. Meth. Enzymd. 244:19-61(1994). 



[1896] 789. Fonnate-tetrahydrofolate ligase signatures 

[1897] Formate-tetrahydrofolate ligase (EC 6.3.4.3) (fonmyltetrahydfofolate synthetase) (FTHFS) is one of the en- 
zymes participating in the transfer of one-cart)on units, an essential element of various biosynthetic pathways. In many 
of these processes the transfers of one-carbon units are mediated by the coenzyme tetrahydrofolate (THF). Various 
reactions generate one-carbon derivatives of THF which can be interconverted between different oxidation states by 
FTHFS, methyleneletrahydrofolate dehydrogenase (EC 1.5.1.5) and methenyltelrahydrofolate cyclohydrolase (EC 
3.5.4.9). 

[1898] In eukafyotes the FTHFS activity is expressed by a multifunctional enzyme, C-l -tetrahydrofolate synthase 
(CI -THF synthase), which also catalyzes the dehydrogenase and cyclohydrolase activities. Two forms of C1-THF 
synthases are known (1 1, one is located in the mitochondrial matrix, while the second one is cytoplasmic. In both forms 
the FTHFS domain consist of about 600 amino acid residues and is located in the C-terminal section of CI -THF syn- 
thase. In prokaryotes FTHFS activity is expressed by a monc^unctional homotetiameric enzyme of about 560 amino 
acid residues [2]. 

[1899] The sequence of FTHFS is highly consen/ed in all forms of the enzyme. As signature pattems, two regions 
that are almost perfectly conserved were selected. The first one is a glycine-rich segment located in the N-terminal 
part of FTHFS and which could be part of an ATP-binding domain [2]. The second pattern is located in the central 
section of FTHFS. 

[1 900] Consensus pattemG-(LI Vf^)-K-G-G-A-A-G-G-G-Y 
Consensus patternV-A-T-[IV]-R-A-L-K-x-lHN]-G-G 

( 1] Shannon K.W.. Flabinowitz J.C. J. Biol. Chem. 263:7717-7725(1988), 

[ 21 Lovell C.R.. Przybyla A, Ljungdahl LG. Biochemistry 29:5687-5694(1990). 

[1 901 ] 790. Transthyretin signatures 

[1902] Transthyretin (prealbumin) II] is a thyrokj honmone-binding protein that seems to transport thyroxine (T4) 
from the bloodstream to the brain. It is a protein of about 130 amino acids that assembles as a homotetramer and 
forms an internal channel that binds thyroxine. Transthyretin is mainly synthesized in the brain choroid plexus. In 
humans, variants of the protein are associated with distinct forms of amylokjosis. 

[1 903] The sequence of transthyretin is highly consented in vertebrates. A number of uncharacterized proteins also 
belong to this family: 



Escherichia coli hypothetical protein yedX. 
Bacillus subtilis hypothetical protein yunM. 
Caenorhabditis elegans hypothetical protein R09H10-3. 
Caenorhabditis elegans hypothetical protein 2K697.8. 



[1904] Two regions were selected as signature pattems. The first located in the N-terminal extremity starts with a 

lysine known to be involved in binding T4. The second pattern is kx^ated in the C-terminal extremity. 

[1 905] Consensus paUem[KH]-IIV]-L-(DNl-x(3)-G-x-P-A-x(2)-[l V)-x-{l V] [The K binds thyroxine] 

Consensus pattern Y-(THHIV]-(AP)-x(2)-L-S-[PQHFYWl-(GSHFY]-(QSJ 

[1906] [ 1] Schreiber G.. Richardson S.J. Comp. Bkxhem. Physiol. 116B: 137-1 60(1 997). 

[1907] 791. Dihydropteroate synthase signatures 

[1908] AH organisms require reduced folate cof actors for the synthesis of a variety of metabolites. Most microorgan- 
isms must synthesize folate de novo because they lack the active transport system of higher vertebrate cells which 
altows these organisms to use dietary folates. Enzymes that are involved in the bk>synthesis of folates are therefore 
the target of a variety of antimicrobial agents such as trimethoprim or sulfonamkJes. 

[1909] Dihydropteroate synthase (EC 2.5.1.15) (DHPS) catalyzes the condensation of 6-hydroxymethyl-7.8-dihy- 
dropteridine pyrophosphate to para-aminobenzoic ackJ to form 7.8-dihydropteroate. This is the second step In the three 
steps pathway leading from 6-hydroxymethyl-7.8-dlhydropterin to 7.8-dihydrofolate. DHPS is the target of sulfonamides 
which are substrates analog that compete with para-aminobenzoic acid. 

[1910] Bacterial DHPS (gene sut or folP) [1] is a protein of about 275 to 315 amino acid reskJues which is either 

chromosomally encoded or found on various antibiotic resistance plasmids. In the lower eukaryote Pneumocystis car- 

inii. DHPS is the C-terminal domain of a multifunctional folate synthesis enzyme (gene fas) (2). 

[1911] Two signature pattems for DHPS were de»/eloped. the first signature is located in the N-terminal section of 

these enzymes, while the second signature is located in the centra! section. 

[1912] Consensus pattemIUVI^]-x-lAG]-lLIVMF](2)-N-x-T-x-D-S-F-x-D-x-ISGl 
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[41 Henrissat B. Biochem. J. 280:309-316(1991) 

[ 5, Tu„ D.. Whhe. S.G.. Gilkes N.R. Kitoum O.G.. W^en a/.j.. AebersoW R J. Bid Chem. 266:15621-15625 
fIffS f'™'='<»«*'sphosphale aldolase ciass-ll signatures 

zinc - tor their activity. homod«nenc enzymes wh^S, require a divalent metal ion - generally 

[1 888] This lamily also includes the folkwring proteins: 

- Eschenchia coO N-acetyl galactosamine operon protein agaY which catalyzes the same reaction as that o, gatY. 
binding a zinc ion. The se^u^fe "^T^' shown [4J to be involved in 

Consensus patterr^LI VMJ-E-x-E-{LI VMJ-G-x(2)4GMHGSTA)-x-E 

( 1] Pertiam R.N. Biochem. Soc. Trans. 18:185-187(1990) 

[ 2J Marsh J.J.. Lebheiz H.G. Trends Biochem. Sci. 17:110-113(1992) 

'"W^fdase family serine active site 

■ Yeast vacuolar dipeptidyl aminopeptidase 8 (DPAP 8) (gene- DAP2) 

With a free amino-termW^ N-acetylated protein to generate a N-acetylated amino acid and a pr Jein 

I II Rawlings N O . Polgar L. Barrett A.J. Biochem. J. 279:907-9ll(i99i) 

S o "^"^'"^^ "oPPe-Seyler 373:353-360(1992) 

I JJ Kolgar L, Szatjo E. Biol. Ghem. Hoppe-Seyler 373:361-366(1992) 
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terminal extremity. The mammalian enzyme differs from the bacterial or yeast proteins by having an EF-hand calcium- 
binding region (See <PDOC00018>) in its C-terminal extremity. 

[1922] Two signature pattems were developed. One based on the first half of the FAD-binding domain and one which 
corresponds to a consented region in the central part of these enzymes. 
[1923] Consensus pattem[IVJ-G-G-G-x(2)-G-[STACV]-G-x-A-x-D-x{3)-R-G 
Consensus pattemG-G-K-x(2HGSTEl-Y-R-x(2)-A 

[ 1] Austin D., Larson TJ. J. Bacteriol. 173:101-107(1991). 

( 2] Roennow B„ Kielland-Brandt M.C. Yeast 9:1121-1130(1993). 

[ 3) Brown LJ.. McDonald M.J., Lehn D.A.. Moran S.M. J. Biol. Chem. 269:14363-14366(1994). 

[1924] 794. NOL1/NOP2/sun family signature 

[1925] The following proteins seems to be evolutionary related: 

- Mammalian proliferating-cell nucleolar antigen pi 20 (gene NOLI) which may play a role h the regulation of the 
cell cycle and the increased nucleolar activity that is associated with the cell proliferation. 

- Yeast nucleolar protein NOP2 (or YN A1 ) which could be Involved in nucleolar function during the onset of growth, 
and in the maintenance of nucleolar stmcture. 

- Yeast hypothetical protein YBL024w. 
Bacterial protein sun (also known as fmu). 

- Escherichia coli hypothetical protein yebU. 

- Mycobacterium tuberculosis hypothetical protein MtCY21 B4.24. 

- Methanococcus jannaschii hypothetical protein MJ0026. 

NOLI is a protein of 855 residues, NOP2 consists of 618 residues, YBL024w of 684, sun is a protein of about 430 to 
450 residues and MJ026 has 274 residues. They share a consented central domain which contains some highly con- 
served regions. One of these regions was selected as a signature pattern. 
[1 926] Consensus pattem(FV)-E)-(KRAHU VMA]-L-x-D-[AV]-P-C-IST]-{G A] 
[1 927] 795. moaA / nif B / pqqE family signature 

[1 928] A number of proteins involved in the biosynthesis ol metallo cofactors have been shown (1 .2] to be evolutionary 
related. These proteins are: 

- Bacterial and archebacterial protein moaA, which is involved in the biosynthesis of the molybdenum cofactor fmo- 
lybdopterin; MPT). 

- Arabidopsis thaliana cnx2, a protein involved in molybdopterin biosynthesis and which is highlys similar to moaA. 

- Bacillus subtilis narA. which seems to be the moaA ortholog in that bacteria. 

- Bacterial protein nifB (or fixZ) which is involved in the biosynthesis of the nitrogenase iron-molybdenum cofactor. 

- Bacterial protein pqqE which is involved in the biosynthesis of the cofactor pyrrolo-quinoline-quinone (PQQ). 

- Pyrococcus furiosus cmo. a protein involved in the synthesis of a molybdopterin-based tungsten cofactor 

- Caenorhabditis elegans hypothetical protein F49E2. 1 . 

[1929] All these proteins share, in their N-terminal region, a consented domain that contains three cysteines. In 
moaA, these cysteines have been shown [1] to be important for the biological activity They could be inolved in the 
binding of an iron-sulfur cluster. 

(1 930] Consensus pattem(LI Vl-x(3)-C-[NP)-(LI VMF]-IQRSl-C-x-[FYM]-C [The three C's are putative Fe-S ligands 

[ 1] Menendez C„ Igloi G., Henninger H., Brandsch R. Arch. Microbiol. 164:142-151(1995), 
( 2] HofI T. Schnorr K.M., Meyer C. Caboche M. J. Biol. Chem. 270:6100-6107(1995). 

[1931] 796. Forkhead-associated (FHA) domain profile 

[1932] The forkhead-associated (FHA) domain [1,E1] is a putative nuclear signalling domain found in a variety of 
othenwise unrelated proteins. The FHA domain comprise approximately 55 to 75 amino acids and contains three highly 
consented blocks separated by divergent spacer regions. Currently it has been found in the following proteins: 

- Four transcription factors that also contain a forkhead (FH) domain: mouse myocyte nuclear factor 1 (MNF1 ), yeast 
transcription factor FHL1. which probably controls pre-mRNA processing, and yeast FKH1 and FKH2, In those 
protein the FHA domain is located N-temiinal of the DNA-binding FH domain. 

- Kinase-associated protein phosphatase (KAPP) from Arabidopsis thaliana, a protein which specifically interacts 
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Consensus panomtGEHSAl-x4UVMJ(2HHUVMl-G-{GPl-x(2HSTAhx-P 

!o! S'?*'';®^'^ ^ ®^ - ' P- J- BacterioL 17^7211-7226(1990) 

1 2J Vblpes F.. Dyer M.. Scaife J.6.. Dart,y G.. Stan«,ers O.K.. Delves C.J. Gene 11^S(1992). 

[1913] 792. Phosphatidylinositol 3- and 4-kinases signatures 

P.-3.44>(2, and P..3.4.5^l) - is'no, yet at^J ^Tp^S" maX^rj slS'^ " 

cell Signalling. Currently, three forms o1 PI3-kinase are toown: ^ •»>es8engers h. 

- The mammalian enzyme which is a heteiodimer of a 110 Kd catalytic chain folio* anH oc i^^ u , „ 

- V^TORl/DfWi andTOR2«>RR2l2J. PI3-kinases required for cell cycle activation. Both are proteins of about 

" I^'^^^i^J* ^ '"^^'ved in vacuolar sorting and segregation. VPS34 is a protein of about loo ifH 

- Arabidopsis thaliana and soybean VPS34 homologs. -'^'saproieinoraboutlOOKd. 

following forms of PI4-kinases are known: irispnosphate. Currently the 

- Human PI4-kinase alpha 

- Yeast PI K1 . a nuclear protein of 1 20 Kd. 

- Yeast STT4. a protein of 21 4 Kd. 

[1917] Four additional proteins belong to this family: 

- Yeast hypothetical protein YHR099w. a distantV related member of this family. 

- Fission yeast hypothetical protein SpAC22El 2. 1 6C. 

[1918] Consensus Pattem(LIVMFAChK-x(1 .3HDEAHDEHLIVMCI-R-Q-rDEl-x(4)-Q 
consensus pattemIGS]-x-IAN^-x(3HUVM]-x(2HFYHl-[UVMl(2^x4LlvMFj:xJ-S^^^^^^ 

3 S^u i V T S^Wer U.. Oeuter-Reinhard M.. H^owa N.. Hall M.N. Ce 1 73 535-596(1993) 

! o ■ ^"^^"^ '^'^ W^'^rfi^W M O . Emr S O. Science 260 Se-giflSs? 

4 Garcia-Bustos J.R. Marini F.. Stevenson I.. Frei C. Hall M.N. EMBO J 132351-2^1(19^1 

[ 5,^^a«n E.... /.bars M.W.. Sh. T.B.. Ic^iKawa K.. Ke«h C.T.. Une W.S.l'sSfiefs.KL 369:756-758 

[ 6) Kato R. Ogawa H. Nucleic Acids Res. 22:3104-3112(1994). 
!J!lm IfA I'^f^^ giycerol-3^,hosphate dehydrogenase signatures 

11921] These enzymes are proteins of about 60 to 70 Kd which contain a probable Fii-bindhg domain in meir N- 
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Number of members: 581 
[1946] 

[1] Dyda F. Hickman AB. Jenkins TM. Engelman A. Craigle R. Davies DR; Medline: 95099322. Crystal stmcture 
of the catalytic domain of HIV-1 integrase: similarity to other polynucleotidyl transferases.* Science 1994 266 
1981-1986. 

[2] Lodi PJ. Emst JA. Kuszewski J. Hickman AB. Engelman A, Craigie R. Ctore GM. Gronenbom AM; Medlkie: 
95359147; Solution structure of the DNA binding domain of HIV-1 integrase." Bk)chemistry 1995;34:9826-9833 

[1947] 802. ligLChan 
Ligand-gated ion channel 

[1 948] This family includes the four transmembrane regions of the ionotropic glutamate receptors and NMD A recep- 
tors. 

Number of members: 128 

[1949] {1] Tong G, Shepherd D. Jahr CE; Medline: 95184014; Synaptic desensitization of NMDA receptors by cal- 
cineurin.' Science 1995;267:1510-1512. 
[1950] 803. RhoGAP 
RhoGAP domain 

[1 951] GTPase activator proteins towards Rho/Rac/Cdc42-like small GTPases. 

Number of members: 97 

[1952] 

(1]Musacchk)A. Cantley LC, Harrison SC; Medline: 97121392; Crystal sUucture of the breakpoint cluster reglon- 
homotogy domain from phosphoinositkje 3-kinase p85 alpha subunit.' Proc Natl Acad Scl U S A 1 996-93' 
14373-14378. 

[2] Barrett T, Xiao B, Dodson EJ, Dodson G, Ludbrook SB, Numiahomed K, Gamblln SJ, Musacchio A, Smerdon 
SJ. Eccleston JF; Medline: 97162209; The structure of the GTPase-activating domain from p50rhoGAP." Nature 
1997;385:458-461. 

[3] Rittinger K. Walker PA. Eccleston JF, Nurmahomed K. Owen D. Laue E. Gamblin SJ, Smerdon SJ; Medline: 
97404320; Crystal structure of a small G protein in complex with the GTPase-activating protein rhoGAR* Nature 
1997;388:693-697. 

[4] Boguski MS, McCormIck F; Medline: 94081948; Proteins regulating Ras and its relatives ' Nature 1 993-366- 
643-654. 

[1953] 804. vwd 

von Willebrand factor type D domain 

[1954] (1) Bork P; Medline: 93327926; The modular architecture of a new family of growth regulators related to 
connective tissue growth factor." FEBS lett 1993;327:125-130. 

Number of members: 92 

[1955] 805. zf-C4_Topoisom 
Topotsomerase DNA binding C4 zinc finger 

(11 Tse-Dinh YC. Beran-Steed RK; Medline: 89034032; Escherchia coli DNA topoisomerase I is a zinc metallo- 
protein with three repetitive zinc-binding domains." J Biol Chem 1988;263:15857-15859. 
[2J Ahumada A. Tse-Dinh YC; Medline: 99011409; The 2n(ll) binding motifs of E. coll DNA topoisomerase I is part 
of a high-affinity DNA binding domain." Biochem Biophys Res Commun 1998;251:509-514. 

Number of members: 51 

[1956] 806. AIRC 
AIR carboxylase 
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only If the kinase is phosphorytated on serine residues [21 

SPK1/SAD1 13^ The latter is the only known protein containing two copies ^FHA doo^r^^" 
Pro e„ kinase cdsl from fission yeast contains a FHA domain and i be the (xtt^CT^I 
Prote« kinase MFKl from east, which is involved in mefote recombinaL ^ 
Human nuclear antigen Ki67 whfch is expressed onV in proliferating cells 
Yeast hypometical protein YHR11 Sc. whk:h contains a RING-finger C-terminal of the FHA domain 

fT^"^ '"^"^ '^^^ - ^'^--^ re1^CHem,inal o, Ihe 



- Caenofhabditis elegans hypothetical protein ZK632.2. 

- Caenortiabditis elegans hypothetical protein C01 G6 5 

■ InnLT!" P'°'^'y°'^ Anabaena. which contains a zinc-finger motif N-terminal of the FHA domain 

" meSSr ^^bT °" "'^ ^^^-"^ ^ P'^'^^ Kinase'Srr.'Lr.app-^g 

1 1) Hofmann K.O.. Bucher R Trends Biochem. Sci. 20:347-349(1995) 

1 2] Stone J^M.. CoHinge MA. Smith RD.. Horn M.A.. Walker J.C. Science 266:793.795(19941 

{ 3] Navas T.A.. Zhou Z. Elledge SJ. Cell 80:29-39(1995). ^o^^^^i 

[1933] 797. Ald_Xan_dh_C 

Aldehyde oxidase and xanthine dehydrogenase. C terminus 

Number of members: 54 

11935] 798. Glyco.hydro_38 
Glycosyl hydrolases family 38 

[1936] Glycosyl hydrolases are key enzymes of carbohydrate metabolism. 
Number of members: 20 

[JS 799"he"ct ' ^' "^^^ Soc Trans 1998;26:153-156. 

HECT-domaIn (ubiquitin-transferase). 

[1 939] The name HECT comes from Homologous to the E6-AP Carboxyl Terminus. 
Number of members: 43 

[1940] [1J Huibregtse JM. Scheffner M, Beaudenon S Howlev PM- MPdiinA- Q^oonoo^. * * ... 
HRDC domain 

L'rHR?c dS^^^ "^"^ ^ ^ ^^^'^ ^^^9. Mutates 

Number of members: 19 

[1944] 801. Integrase 

[1945] Integrase mediates integration of a DNA copy of the viral genome into the host chfnmo.«vn» io.«.„ » • 



OfiO 
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[ 2] Tamkun J.W.. Deuring R., Scott M.P.. Kissinger M.. Pattatucci A.M.. Kaufman T.C.. Kennison J A. Cell 68 
561-572(1992). 

1 3J Tamkun J.W. Curr. Opin. Genet. Dev. 5:473-477(1995). 

[1963] 808. (CH) Actinin-type acth-binding domain signatures 
PROSITE cross-reference(s): PS00019; ACTININ^I. PS00020; ACTININ_2 

[1964] Alpha-acttnin is a F-actin cross-linking protein which is thought to anchoractin to a variety of intracellular 
structures (1], The actin-blnding donDain of alpha-actinin seen^ to reside in the first 250 residues of the protein. A 
similar actin-binding domain has been found in the N-terminal regkxi of many different actin-binding proteins [2,3]: 

In the beta chain of spectrin (or fodrin). 

In dystrophin, the protein defective in Duchenne muscular dystrophy (DMD) and which may play a role in anchoring 
the cytoskeleton to the plasma membrane. 
In the slime nrrald gelation factor (or ABP-1 20). 

In actin-binding protein ABP-280 (or filamin), a protein that link actin filaments to membrane glycoproteins. 

- In fimbrin (or plastin), an actin-bundiing protein. Fimbrin differs from the above proteins in that it contains two 
tandem copies of the actin-binding domain and that these copies are located in the C-terminal part of the protein. 

[1965] Two conserved regions were selected as signature patterns for this type of main. The first of this region is 
located at the beginning of the domain, hile the second one is located in the central section and has been shown to 
be essential for the binding of actin. 
[1 966] Consensus pattem[EQ]-x(2)-[ATVl-[FYl-x(2)-W-x-N 

Consensus pattem[LIVM]-x- (SGN]-[UVM]-[DAGHEHSAG]-x-fDNEAG]-ILIVM]-x-[DEAG]-x(4)-[LIVf^l-x-[LMl-[SAG1- 
[UVM]-[LIVMT]-W-x- [LIVI^](2) 

[ 1] Schleicher f^., Andre E., Harmann A., Noegel A.A. Dev. Genet, 9:521-530(1988). 
[ 2) Matsudaira P. Trends Biochem. Sci. 16:87-92(1991). 
[ 3] Dubreuil R.R. BioEssays 13:219-226(1991). 

[1967] 809. (COX1 ) Heme-copper oxidase subunit I, copper B binding region signature PROSITE cross-reference 
(s): PS00077; COX1 

Heme-copper respiratory oxidases [1] are oligomeric integral membrane protein complexes that catalyze the terminal 
step in the respiratory chain: they transfer electrons from cytochrome c or a quinol to oxygen. Some terminal oxidases 
generate a transmembrane proton gradient across the plasma membrane (prokaryotes) or the mitochondrial hner 
membrane (eukaryotes). The enzyme complex consists of 3-4 subunits (prokaryotes) up to 1 3 polypeptides (mammals) 
of which only the catalytic subunit (equivalent to mammalian subunit 1 (CO I)) is found in all heme-copper respiratory 
oxidases. The presence of a bimetallic center (formed by a high-spin heme and copper B) as well as a low-spin heme, 
both ligated to six conserved histidine residues near the outer side of four transmembrane spans within CO I is common 
to all family members [2-4]. 

[1968] In contrary to eukaryotes the respiratory chain of prokaryotes is branched to multiple terminal oxidases. The 
enzyme complexes vary in heme and copper composition, substrate type and substrate affinity. The different respiratory 
oxidases alk)w the cells to customize their respiratory systems according a variety of environmental growth conditions 
[1]. 

[1 969] Recently also a component of an anaerobic respiratory chain has been found to contain the copper B binding 
signature of this family: nitrk: oxide reductase (NOR) exists in denitrifying species of Archae and Eubacteria. 
[1 970] Enzymes that belong to this family are: 

- Mitochondrial-type cytochrome c oxidase (EC 1 .9.3. 1 ) which uses cytochrome c as electron donor The electrons 
are transferred via copper A (Cu(A)) and heme a to the bimetallic center of CO I that is formed by a penta-coor- 
dinated heme a and copper B (Cu(B)). Subunit 1 contains 1 2 transmembrane regk)ns. Cu(B) is said to be ligated 
to three of the conserved histidine residues within the transmembrane segments 6 and 7. 

Quinol oxidase from prokaryotes that transfers electrons from a quinol to the binuclear center of polypeptide I. This 
category of enzymes includes Escherichia coli cytochrome O terminal oxidase complex which is a component of 
the aerobic respiratory chain that predominates when cells are grown at high aeration. 

FixN. the catalytic subunit of a cytochrome c oxidase expressed in nitrogen-fixing bacteroids living in root nodules. 
The high affinity for oxygen allows oxidative phosphorylation under tow oxygen concentrations. A similar enzyme 
has been found in other purple bacteria. 

- Nitric oxide reductase (EC 1.7.99.7) from Pseudomonas stutzeri. NOR reduces nitrate to dinitrogen. It is a het- 
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Members of this family catalyse the decaft)Oxylation of 1 K5i>hosphoribosyO-5^tno^.imida2oleK:aft)oxylate (AIR). 

This family catalyse the sixth step of de novo purine biosynthesis. Some oiembers of this family contain two copies <rf 

this domain. Number of members: 35 

[1957] 607. Bromodomain signature and prc^le 

PROSITE cross-reference(s): PS00633; BROMODOMAIN.I. PS50014; 

BROMODOMAIN_2 

The bromodomain (1 .2.3] is a consented region of about 70 amino acids found in the following proteins: 

- Higher eukaryotes transcription initiation factor TFIID 250 Kd subunit (TBP-associated factor p250) (gene CCG1). 
P250 associated with the TFIID TATA-box binding protein and seems essential for progression of the Gl phase 
of the cell cycle. 

- Human RING3. a protein of unknown function encoded in the MHC class II locus. 

- I^ammalian CREB^inding protein (CBP). which mediates cAMP-gene regulation by binding specifically to phos- 
phorylated CREB protein. 

- Drosophila female sterile homeotic protein (gene f sh). required maternally for proper expression of other homeotic 
genes involved in pattern fonmation, such as Ubx. 

- Drosophila brahma protein (gene brm). a protebi required for the activation of multiple homeotic genes. 

- Mammalian homologs erf brahma. In human, three brahma-like proteins are known: SNF2a(hBRM), SNF2b. and 
BRGI . 

- H uman BS69. a protein that binds to adenovirus E 1 A and inhibits E 1 A transactivation - Human peregrin (or Bri 40). 

- Yeast BDF1 (3). a transcription factor involved in the expression of a broad class of genes including snRNAs, 

- Yeast GCN5. a general transcriptional activator operating in concert with certain other DN A-binding transcriptional 
activators, such as GCN4. HAP2^3/4 or ADA2. 

- Yeast NPS1/STH1, involved in G(2) phase control in mitosis. 

- Yeast SNF2/SWI2. which is part of a complex with the SNF5. SNF6. SWI3 and ADR6/SW11 proteins. This SWI- 
complex is involved in transcriptional activation. 

Yeast SPT7. a transcriptional activator of Ty elements and possibly other genes. 
Caenorhabditis elegans protein cbp-1 . 

- Yeast hypolhetfcal protein YGR056w. 

- Yeast hypothetcal protein YKROOSw. 

- Yeast hypothetical protein L9638, 1 . 

[1 958] Some proteins contain a regkxi which, while smiilar to some extent to a classkal bromodomain. cfiverges from 
it by either lacking part of the domain or because of an insertran. These proteins are: 

- Mammalian protefri HRX (also known as AIM or MLL). a protein involved in translocations leading to acute leuke- 
mias and which possibly acts as a transcriptional regulatory factor. HRX contains a region similar to the G- terminal 
half of the bromodomain. 

- Caenorhabditis elegans hypothetfcal protein ZK783.4. The bromodomain of this protein has a 23 amino^W in- 
sertion. 

- Yeast protein YTA7. This protein contains a region with significant similarity to the C-terminal half of the bromodo- 
(Dam. As It is a nriember of the AAA family (see <PDOG00572>) it is also in a functionally different context. 

[1 959] The above proteins generally contain a single bronrKxJomain. but some of them contain two copies this is the 
case of BDF1 , CCG 1 . fsh. RING3, YKROOSw and L9638. 1 . 

[1 960] The exact function of this domain is not yet known but it is thought to be Involved in protein-protein interactions 
and it may be important for the assembly or activity of multicomponent complexes involved in transcriptional activation 
[1961] The consensus pattem that has been devetoped spans a major part of the bromodomain; a more sensitive 
detection is available through the use of a profile which spans the whole domain. 

Consensus pattern[STANVF]-x(2)-F-x{4)-[DNSl-x(5,7)-(DENQTFl-Y4HFY].x(2)-[UVMFYl-x(3)-[UVMl.x(4) 
(6.8)-Y-x(12.13)-[UVM].x(2)-N-{SACF]-x(2)-[FY] ^ 

References 
[1962] 

( 1] Haynes S.R.. Doolard C. Winston R, Beck S.. Trowsdale J.. Dawid LB. Nucleic AckJs Res 20 2693-2603 
(1992). 
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residues. 

[1 981 ] Signature patterns were developed for both consented regions. 

[1982] Consensus pattem[EDQH]-x-K-x-(DN]-G-x-R-(GACIVM] fK is the active site residue] 

II 983] Consensus pattemE-G^LI VMAJ^U VMl(2)^KR]-x{5.8HYW]^QNEI<l-x(2.6)-[KRHl-x(3,5)-K-[LI^ 

Sequences known to belong to this class detected by the pattemALL. except for archebacteriat DNA ligases. 



{11 

Tomkinson A.E., Totty N.F., Ginsburg M., Undahl T. 
Proc. Natl. Acad. ScL U.S.A. 88:400-404(1991). 
(2] 

Undahl T.. Barnes D.E. 

Annu. Rev. Bkxshem. 61:251-281(1992). 

[3] 

Kletzin A, 

Nucleic Acids Res. 20:5389-5396(1 992). 



[1 984] 81 2. (FAD_Gly3P_dh) FAD-dependent gfycerol-3^)hosphate dehydrogenase signatures PROSITE cross-rel- 
erence(s): PS00977; FAD_G3PDH_1. PS00978; FAO_G3PDH_2 

[1 985] FAD-dependent glycerol-3-phosphate dehydrogenase (EC 1 . 1 .99.5) (GPD) catalyzes the conversion of glyc- 
erol-3-phosphate into dihydroxyacetone phosphate. In bacteria [1 J it is associated with the utilization of glycerol coupled 
to respiration. In Escherk:hia coli. two isozymes are known: one expressed under anaerobic condittons (gene glpA) 
and one in aerobic conditions (gene g!pD). In eukaryotes, a mitochondrial form of GPD participates In the glycerol 
phosphate shuttle in conjunction with an NAO-dependent cytoplasmic GPD (EC 1.1.1.8) [2. 3]. 
[1986] These enzymes are proteins of about 60 to 70 Kd which contain a probable FAD-binding dornain in their N- 
temninal extremity. The mammalian enzyme differs from the bacterial or yeast proteins by having an EF-hand catoium- 
binding region (See <PDOC00018>) in its C-terminal extremity. 

[1987] Two signature patterns were developed One based on the first half of the FAD-binding domain and one which 
corresponds to a corisen^ed regk>n in the central part of these enzymes. 
[1988] Consensus pattem[IV]-G-G-G-x(2)-G-[STACVl-G-x-A-x-D-x{3)-R-G 
Consensus paltemG-G-K-x(2)-[GSTEJ-Y-R-x(2)-A 

III 

Austin D., Larson T.J. 

J. Bacteriol. 173:101-107(1991). 

[21 

Roennow B.. Kielland-Brandt M.C. 

Yeast 9:1121-1130(1993). 

[3] 

Brown L.J.. McDonakJ M.J., Lehn D.A., f^oran S.M. 
J. Biol. Chem. 269:14363-14366(1994). 



[1989] 813. (Fapy_DNA_glyco) Formamidopyrimidine-DNA glycosylase signature PROSITE cross-reference(s)- 
PS01242; FPG 

[1990] Formamidopyrimidine-DNA glycosylase (EC 3.2.2.23) [1] (Fapy-DNA glycosylase) (gene fpg) is a bacterial 
enzyme involved in DNA repair and whkrfi excise oxidized purine bases to release 2,6-diamina4-hydroxy-5N-methyl- 
formamkiopyrimidine (Fapy) and 7.8-dihydro-8-oxoguanine (8-OxoG) residues. In additfon to its glycosylase activity. 
FPG can also nick DNA at apurinic/apyrimidinic sites (AP sites). FPG is a monomeric protein of about 32 Kd which 
binds and require zinc tor its activity. 

[1 991 ] The binding site for zinc seems to be located in the C-terminal part of the enzyme where fours conserved and 

essential [2] cysteines are located. A signature pattem was developed based on this regbn. 

[1 992] Consensus pattemC-x(2.4)-C-x-IGTAQ)-x-(l V]-x(7)-R-[GSTANJ-[STA]-x-lFYIJ-C- x(2)-C-Q 

{The four C*s are putative zinc ligands] 

[1] 

Duwat P., de Oliveira R.. Ehrlich S.D., Boiteux S. 

Microbb;ogy 141:411-417(1995). 

12] 

O'Connor T.E., Graves R.J., Demurcia G., Castaing B., Laval J. 
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:2Z:'s':^rsrr'''^"'^"'^^ ^^-^^ thee .va^.Hls«..„e .s^u^s^^^t^ 

consensus Panen^GHUVnrWTA,(aHVGSH4LNP^x-V-x(4^^^^^ (The .hree Hs are cc^, B Ug- 
I1973J Notecytochrome bd complexes do not belong to this tamiy 
11] 

0 Gpia.Horsn«n J.A.. Bacquera B.. Rumbtey J.. Ma J.. Gennis RB. J. BaCeriol 176:5587-5600(,994, 

J^stresana J.. Luebben M.. Saraste M.. HIggins D.G. EMBO J. 1 3:2516-2525(1 994). 

CapaWi RA. Malatesta F.. Dartey-Usmar V.M 
aochnn. Biophys. Acta 726:135-148(1983). 

Holm L. Saraste M.. Wikstrom M. 
EMBO J. 6:281 9-2823( 1 987) 
15] 

' Saraste M. . Castresana J. 

FEBS Lett. 341:1-4(1994). 

Biochim. Biophys. Acta 1057:157-185(1991) 

pJiiSxc -"^^^ ATP-dependent DNA ligase signatures 

tZT^"^'^T^^^^ DNA_UGASE_A1. PS00333- DNA UGASE A2 

rn=oS— ^^^^^ ^.^^.s by cata.y.ing 

[1979] Eukaryolic. archaebacterial, virus and phage DNA liqases are ATP^o. ^ . o 

joning reaction, the ligase interacts with ATP to form a covale^^.^r ^^J^-^^^""^"^- ^<"9 the first step of the 
residue is the site of adenylalion [1 2J enzyme-adenylafe intermediate. A conserved lysine 
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Consensus pattemL-x-F-L-H-x-Y-H-H 
111 

Bairoch A. 

Unpublished observations (1996). 
(2) 

El-SherbeinI M,, Clemas J. A, 

J, Bacteriol. 177:3227-3234(1995). 

131 

Garcia-Arranz M., MakJonado A.M., Mazon M J., Portillo F. 
J. Biol. Chem. 269:18076-18082(1994). 

[2006] 8 1 7. Immunoglobulins and major histocompatibility complex proteins signature PROSITE crossH-eferencefs)* 
PS00290; IG.MHC 

[2007] The basic structure of immunoglobulin (Ig) [1 J molecules is a tetramer of two light chains and two heavy chains 
linked by disulfide bonds. There are two types of light chains: kappa and lambda, each composed of a constant domain 
(CL) and a variable domain (VL). There are five types of heavy chains: alpha, delta, epsiton. gamma and mu, all 
consisting of a variable domain (VH) and three (in alpha, delta and gamma) or four (in epsilon and mu) constant domains 
(CHI to CH4). 

[2008] The major histocompatibility complex (MHC) molecules are made of two chains. In class I [2] the alpha chain 
is composed of three extracellular domains, a transmembrane region and a cytoplasmic tail. The beta chain (beta- 
2-microgtobulin) is composed of a single extracellular domain. In class II [3], both the alpha and the beta chains are 
composed of two extracellular domains, a transmembrane region and a cytoplasmic tail. 

[2009] It is known [4,5] that the Ig constant chain domains and a single extracellular domain in each type of MHC 
chains are related. These homotogous domains are approximately one hundred amino ackis kxig and riclude a con- 
served intradomain disulfide bond. A small pattern around the C-terminal cysteine is involved in this disulfide bond 
whk:h can be used to detect these category of Ig related proteins. 

[2010] Consensus pattem[FY]-x-C-x-[VA]-x-H-Sequences known to betong to this class detected by the pattern: Ig 
heavy chains type Alpha C region : All, in CH2 and CH3. Ig heavy chains type Delta C region : All. in CH3. Ig heavy 
chains type Epsilon C region: All, in CHI, CH3 and CH4. Ig heavy chains type Gamma C region : All. in CH3 and also 
CH1 in some cases Ig heavy chains type Mu C regton : All, in CH2. CH3 and CH4. Ig light chains type Kappa C regbn : 
In all CL except rabbit and Xenopus. Ig light chains type Lambda C region : In all CL except rabbit MHC class I alpha 
chains : 

All. in alpha-3 domains, hcluding in the cytomegatovirus MHC-1 homologous protein [6]. Beta-2-microglobulin : AH. 
MHC class II alpha chains: All. in alpha-2 domains. MHC class II beta chains: All. in beta-2 domains. 

[1] 

Gough N. 

Trends Bkxhem. Sci. 6:203-205(1 981 ). 
12] 

Klein J., Figueroa F. 

Immunol. Today 7:41-44(1986). 

13] 

Figueroa F., Klein J. 

Immunol. Today 7:78-81(1986). 

[4] 

Orr H.T, Lancet D.. Robb R.J.. Lopez de Castro J.A.. Strominger J.L. 

Nature 282266-270(1979). 

[5] 

Cushley W.. Owen M.J. 
Immunol. Today 4:88-92(1983). 
[6] 

Backs., Barrel B.G. 
Nature 331:269-272(1988). 

[2011] 818. (IGFBP) Insulin-like growth factor binding proteins signature PROSiTE cross-referencefsV PS00222 
IGF.BINDING 

[2012] The insulin-like growth factors (IGF-I and IGF-H) bind to specific binding proteins in extracellular fluids with 
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J. Biol. Chem. 268:9063-9070(1993), 

SS.?JSr!sreP?lrsr'^ Gamma-gimamytoanspeptidase signature PROSITE cross^e.erenco(s): PS00462; 

S!!tLSrT'^"^^'^^''^'"^^ ^•^•^2> 111 •ha transfer Of the gamma-glutamvl moietv 

of glutathione to an acceptor that may be an amino acid, a peptide or water (fomiing glutamateT^Tl^sTkrrS 

«es. rt » an enzyme that consets of two polypeptide chains, a heavy and a light subunii. proce3t,^a sSl 
^n precuiBor The active site of GGT is known to l>e kxated in the light subunl 

11995] The sequences of mammalian and bacterial GGT show a number of regions of hioh similaritv f21 P^.. 
domonas cephalosporin acybses (EC 3.5.,.-) that convert 74,etaK4<a,baxybutanS^,^£CS:aiJSL 

sOTTOtaui activiy pj. tjke GGT. these GL-7ACA acylases. are also composed of two subunits 

il, ^- "^sfved regions correspond to the N-terminal extremity of the mature lioht chains of th«5« 
enzymes. This regkxi was used as a signature panem ^ ^ 

11997] Consensus (»attemT^STAH^-x^ST^IU^fl^^/g-x(4)-G^SN^x-V^STA^x-T-x-^^^ ,2)^FY]-G 
(I) 

Tate S.S.. Meister A. 

Meth. Enzymol. 113:400-419(1985) 

I2J 

Suzuki H.. Kumagai H.. Echigo T, Tochikura T. 

J. Bacteriol. 171:5169-5172(1989) 

[31 

Ishiye M., Niwa M. 

Biochim. Bk>phys. Acta 1132:233-239(1992). 

[1998] 815. G-protein gamma subunit profile 

PROSITE cross-reference(s): PS50058; G_PROTEIN_GAMMA 

S h^^r"'"^ ""^^^^^^"S Pf°»«ins (Q proteins) [1] act as intem.ediaries in the transductkx, of slqnals oen 

for the repla^2^ GoT^i^?^ T '''' '° 

V, ""^ oy v> I r as wen as for membrane anchonng and receptor recognilwn 

SoS,wi?r! ^"^"""^ <"°^ '° "0 that are bound to the membrane via a 

3T2^r;n;trr:::r^^^^^^^^^ -^^-^ ^° - ~s Se^ i 

S-.ii: dS:""^"^ ^'^^^ ^ '^^^^ ^^-'^ ^*9-'«n9. -tains a G-pro.ein 

[2002] A profile was deveteped that spans the complete length of the gamma subunit. 

Pennington S.R. 

Protein Prof. 2:16-315(1995). 

[2003] 816. GNS1/SUR4 family signature 

PROSITE cross-reference(s): PS01t88; GNS1_SUR4 

- Yeast GNS1 [2], a protein involved in synthesis of 1 ,3-beta-gIucan 

' ITJ^^J" ' ^ ^ "^"'^ « glucose-signaling pathway that controls the expres 

sion of several genes that are transcriptionally regulated by glucose conirois me expres- 

- Yeast hypothetical protein YJL1 96c. 

- Caenorhabditis elegans hypothetical protein C40H1 .4. 

- Caenorhabditis elegans hypothetical protein D2024.3. 

iTtLil^f ''"^ ^'"•'^^ Structuralh^. they seem tobe formed of three sectbns 
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- GTP-binding elongation factors (EF-Tu. EF-1 alpha. EF-G, EF-2. etc.). 

- Ras family of GTP-binding proteins (Ras, Rho. Rab, Ral, Yptl. SEC4, etc.). 
Nuclear protein ran (see <PDOC00859>). 

- AOP-ribosylatton factors family (see <PDOC00781>). 
Bacterial dnaA protein (see <PDOC00771 >). 
Bacterial recA protein (see <PDOC001 31 >). 
Bacterial recF protein (see <PDOC00539>). 

- Guanine nucleotide-binding proteins alpha subunits (Gi, Gs, Gt, GO, etc.). 
• DNA mismatch repair proteins mutS family (See <PDOC0038e>). 

- Bacterial type II secretion system protein E (see <PDOC00567>). 

[2021] Not all ATP- or GTP-binding proteins are picked-up by this motif. A number of proteins escape detection 
because the structure of their ATP-binding site is completely different from that of the P-loop. Examples of such proteins 
are the E1-E2 ATPases or the glycolytic kinases. In other ATP- or GTP-binding proteins the flexible loop exists in a 
slightly different form; this is the case for tubulins or protein kinases. A special mentfon must be resen/ed for adenylate 
kinase, in whk:h there is a single deviation from the P-loop pattern; in the last position Gly is found instead of Ser or Thr 
[2022] Consensus pattem[AG]-x(4)-G-K-lST] 

[1] 

Walker J.E,. Saraste M., Runswick M.J.. Gay N.J. 
EMBO J. 1 :945-951 (1 982). 

[21 

MoHer W., Amons R, 
FEBS Lett. 186:1-7(1985). 

[3] 

Fry D.C.. Kuby S.A.. MiWvan A.S. 

Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986). 

14] 

Dever TE., Glynias M.J., Merrick W.C. 

Proc. Natl. Acad, Sci. U.S.A. 84:1814-1818(1987) 

[5] 

Saraste M., Sibbald P.R., Wittinghofer A. 
Trends Bkx:hem. Sci. 15:430-434(1990). 
16] 

Koonin E.V. 

J. Mol. Biol. 229:1165-1174(1993). 
17] 

Higgins C.F., Hyde S.C.. Mimmack M.M.. Gileadi U., Gill D.R., Gallagher M.P. 

J. Bioenerg. Bomembr, 22:571-592(1990). 

[8] 

Hodgman TC. 

Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata) 
[9] 

Under P., Lasko P.. Ashbumer M.. Leroy R. Nielsen P.J.. Nishi K.. 
Schnier J., Slonimski P.P. 
Nature 337:1 21 -1 22(1 989). 
[10] 

Gorbalenya A.E.. Koonin E.V., Donchenko A.P.. Blinov V.M. 
Nucleic Acids Res. 17:4713-4730(1989). 

[2023] 821. PE:PE family 

This family named after a PE motif near to the amino terminus of the domain. The PE family of proteins all contain an 
amino-terminal region of about 1 1 0 amino acids. The carboxyl terminus of this family are variable and fall into several 
classes. The largest class of PE proteins is the highly repetitive PGRS class whrch have a high glycine content. The 
function of these proteins is uncertain but it has been suggested that they may be related to antigenic variation of 
Mycobacterium tuberculosis 11]. Number of members: 88 

[2024] [1] Medline: 98295987, Deciphering the biology of Mycobacterium tuberculosis from the complete genome 
sequence. Cole ST. Brosch R, Parkhill J. Gamier T, Churcher C. Harris D. Gordon SV. Eiglmeter K. Gas S. Barry GE 
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10 



IS 



2S 



inha,it or stia,ula.e me effecte'of the^LFTreSS SJl C s^^TaS^ia 7" 'iT^' 

«jeir cell surtaoe receptors. There are a. leas, six diBeren. .GFBpH^ ^^S^^^ZT'^" 

- Mouse prcrtein cyr6l and its probable chicken homolog. protein CEF-1 0 

■ connective tissue growth factor (CTGF) and its mouse homolog. protein FISP.12 

- Vertebrate protein NOV. ^ *^ 

1^4} AS a Signature pattern a conserved cysteine-rich regbn locatedin the N^ermml sectton of these proteins is 
[2015] Consensus pattemG-C-[GSJ-C-C-x(2)-C-A-x(6)-C 

Sequences known to bekxig to this class detected by the pattemALL. except for IGFBP^'s, 



HJ 

Rechler M.M. 

Vitam. Horm. 47:1-114(1993) 

2^ Shimasaki S., Ling N. 

Prog. Growth Factor Res. 3:243-266(1991) 
13] 

Clemmons D.R. 

Trends Endocrinol. Metab. 1:412-417(1990) 
[4] 

Bradham D M.. Igarashi A., Potter R.L. Grotendorst G R 

J. Cell Bbl. 114:1285-1294(1991) 

{5J 

Matoisel y Martinerie C. Dambrine G.. Plassiart G.. Brisac M.. Crochet 
*^ J., Perbal B. 

MoL Cell. Biol. 12:10-21(1992). 



40 



45 



SO 



SS 



[2018] 820. (myc3sin_head)ATP/GTP-bindin9 site motif A (P-loop) 
PROSITE cross-reference(s): PS00017; ATP_GTP_A 

[2019] From sequence comparisons ciystallographic data ar«lysis it has been shown n 2 3 4 5 si .h«. = 
preciabte proportion of orotelns that hmH atd~ otd .i. ._ wiown |i.<;.j,4,5,6J that an ap- 

- ATP synthase alpha and beta subunits (see <PDOC001 37>) 

- Myosin heavy chains. 

- Kinesin heavy chains and kinesin-like proteins (see <PDOC00343>). 
Dynamins and dynamin-like proteins (see <PDCX::00362>). 
Guanylate kinase (see <PDOC00670>). 

■ Thymidine kinase (see <PD(X00524>). 
• Thymidylate kinase (see <PDCX:01 034>). 
Shikimate kinase (see <PDOC00868>) 

Nitrogenase iron protein family (nifH/lrxC) (see <PDOC00580>) 

: srr R^h^s:^^^^^^^^^ ^^^^^ •-^--) ^ <poocooi85>, 
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Mammalian Ras GTPase-activaling protein (GAP). 

Adaptor proteins mediating binding of guanine nucleotide exchange factors to growth factor receptors: vertebrate 
GRB2. Caenorhabditis elegans sem-5 and Drosophila DRK. 

Mammalian Vav oncoprotein, a guanine^ucleotide exchange factor of the CDC24 family. 
Miscellanous proteins interacting with vertebrate receptor protein tyrosine kinases: oncoprotein Crk, mammalian 
cytoplasmic proteins Nek, She. 

STAT proteins (signal transducers and activators of transcription). 
Chicken tensin. 

Yeast transcriptional control protein SPT6. 

;2033] The profile developed to detect SH2 domains is based on a structural alignment consisting of 8 gap-free 
blocks and 7 linker regions totaling 92 match positions. 

[1] 

SadowskI I., Stone J.C.. Pawson T. 
Mot. Ceil. Biol. 6:4396-4408(1986). 
[21 

Russel R-B., Breed J., Barton G.J. 
FEBS Lett. 304:15-20(1992). 
13) 

Marangere LE.M., Pawson T. 

J. Cell Sci. Suppl. 18:97-104(1994). 

[4] 

Pawson T, Schlessinger J. 
Curr. Biol. 3:434-442(1993). 
(51 

Mayer B. J., Baltimore D. 
Trends Cell. Biol. 3:8-13(1993). 
[6] 

Pawson T. 

Nature 373:573-580(1995). 
[7] 

Kuriyan J., Cowburn D. 

Curr. Opin. Struct. Bfol. 3:828-837(1993). 

[2034] 824. Sulfate transporters signature 

PROSITE cross-reference(s): PS01130; SULFATE^TRANSP 

[2035] A number of proteins involved in the transport of sulfate across a membrane as well as some yet uncharac- 
terized proteins have been shown (1 .2] to be evolutionary related. These proteins are: 

Neurospora crassa sulfate permease 11 (gene cys-14). 
Yeast sulfate permeases (genes SUL1 and SUL2). 
Rat sulfate anion transporter 1 (SAT-1 ). 

- Mammalian DTDST, a probable sulfate transporter which, in Human, Is involved in the genetic disease, diastrophic 
dysplasia (DTD). 

Sulfate transporters 1 , 2 and 3 from the legume Stylosanthes hamata. 

- Hunrjan pendrin (gene PDS). which is involved in a number of hearing loss genetic diseases. 

- Human protein DRA (Down-Regulated in Adenoma). 

- Soybean early nodulin 70. 
Escherichia coli hypothetical protein ychM. 
Caenorhabditis elegans hypothetical protein F41 D9.5. 

[2036] As expected by their transport function, these proteins are highly hydrophobic and seem to contain about 1 2 
transmembrane domains. The best consented region seems to be located in the second transmembrane region and 
is used as a signature pattern. 

[2037] Consensus pattem(PAV]-x-Y-IGSl-L-Y-(STAG](2)-x(4)-(LIVFYAl-lLIVSTl-tYI]-x(3)-lGAHGST]-S-IKR] 
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3rd. Tekaia F. Badcock K. Basham D. Brown 0. Chillingwrth T, Connor R Davies R. Devlin K. FeltweO T Gentles s 
Hamlin N. Holroyd S. Homsby T. Jagels K. Barrell BG. et at Nature 1998-393-S37-5i4 
(2025] 822. (RNB) Ribonucleasa II family signature 
PROSITE cross-reference(s); PS01175; RIBONUCLEASEJI 

[20261 On the basis of sequence similarities, the following bacterial and eukaiyotic proteins seem to form a family: 

- Escherichia cdi and related bacteria ribonuclease II (EC 3.1.13.1) (RNase II) (gene mb) 111 RNase II is an eyo- 

■ Yeast protein SSD 1 (or SRK1 ) which is implicated in the control of the cell cycle G 1 phase 

- Yeastprotem DIS3[2). whichbindstoran (GSPI)andehancesthethenucleotide-feleasingactivityof RCC1 onran 

- Fisson yeast protein dls3. which is implicated in mitotic control. vionian. 

- Neurospora crassa cyt-4, a mitochondrial protein required for RNA 5' and 3- end processing and solicina 

- Yeast protein MSUl. which is involved in mitochondrial biogenesis 

" SteSte^**^ ^"^^ ^ ^ ^ "^''"^ '^^^^ ^° ^ «=3*on« anhydrase inhibitor aceta- 

Caenorfiabditis elegans hypothetical protein F48E8.6. 

S .^^ ^'"k °' "'^'^'"^ '"^ ^ 1250 (SSD1). While their sequence is highly 

divergent they share a consenred domain in their C-tem,inal section [4]. It is possible that this doma^ piavs a roH 
a putet«/e exoriudease f unclkx, that would be common to all these proteins. A signature pattern was develi^ 
on the core of this conserved domain. cvoiwpou udsea 



[II 

Zilhao R., Canrielo L. Arraiano CM. 
^0 Mol. Microbiol. 8:43-51 (1 993) 

[2] 

Noguchi E.. Hayashi N.. Azuma Y,. Seki T. Nakamura M., Nakashima N.. 
Yanagida M.. He X.. Mueller U.. Sazer S., Nishimoto T. 
EMBO J. 15:5595-5605(1996), 
[3] 

Beuf L. Bedu S.. Cami B.. Joset F. 
Plant Mol. Bol. 27:779-788(1 995) 
[41 

Mian IS. 

Nucleic Acids Res. 25:3187-3195(1997). 

[2029] 823. Src homotogy 2 (SH2) domain profile 
PROSITE cross-reference(s): PS50001; SH2 



[20301 The Src homobgy 2 (SH2) domain is a protein domain of about 100 amino^cid residues first kJentified as a 

TT"" oncoproteins Src and Fps fl J. Similar sequences were later found iS n^ny 

othemtraceiiulars,^^^ 

^^^2l:^TT'' T '"^""^ P^-P^otyrosine-contalning target peptides in a sequence-spS S 
strictly phosphorylatiOT-dependent manner {3.4. 5,6]. 

^^llJ^f^J^^ "^"^^ ^ consen/ed 3D structure consisting of two alpha helices and six to seven beta-strands 
J^^-^ ^ tirr ^ continuous beta-meander composed of two connected beta-sheets [7] 

l^U32j So far. SH2 domains have been identified in the foltowing proteins: 

' thrirA? ? cytoplasmic (non-receptor) protein tyrosine kinases. In particular in 
the Src. Abl, Bkt. Csk and ZAP70 families of kinases. 

- Mammalian phosphatidylinositol-specific phospholipase C gamma-1 and -2. Two copies of the SH2 domain are 
found in those proteins in between the catalytic 'X-' and 'Y-boxes^see <PDOC50007>) 

- Mammalian phosphatidyl inositol 3-kinase regulatory p85 subunit. 

- Some vertebrate and invertebrate protein-tyrosine phosphatases. 
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Bacillus subtilis hypothetical protein ypsC. 

Synechocystis strain PCC 6803 hypothetical protein slr0064. 

Methanococcus jannaschii hypothetical proteins MJ0438 and MJ0710. 

[2047] These are hydrophilic proteins of from 40 Kd to about 80 Kd. They can be picked up in the database by the 
following pattern. 

[2048] Consensus pattemD-P-lLI VMFl-C-G-(ST|-G-x(3)-[LI]-E 
References: 

[2049] [ 1] Bairoch A. Unpublished obsen^ations (1997). 
[2050] 830. Uncharacterized protein family UPF0031 signatures 

PROSITE Cfoss-reference(s): PS01049; UPF0031_1; PS01050; UPF0031_2 The following uncharacterized proteins 
have been shown [1] to share regions of similarities: 

Yeast chromosome XI hypothetical protein YKLISIc. 
Caenofliabditis elegans hypothetical protein R 107.2. 
Escherichia coli hypothetical protein yjeF. 
Bacillus subtilis hypothetical protein yxkO. 
Helicobacter pylori hypothetical protein HP1 363. 
Mycobacterium tuberculosis hypothetical protein MtCY77.05c. 
Mycobacterium leprae hypothetical protein B229_C2_201 . 
- Synechocystis strain PCC 6803 hypothetical protein sll1433. 
Methanococcus jannaschii hypothetical protein MJ 1 566. 

[2051] These are proteins of about 30 to 40 Kd whose central region is well conserved. They can be picked up in 
the database by the folk)wing patterns. 

[2052] Consensus pattem[SAVl-[l\^4LVAJ-[UVJ-G4PNSl-G-L-IGPJ-x-{DENQT] 
Consensus pattemIGA]-G-x-G4D-lh/]-tLT]4STA]-G-x-[LI VM] 
[2053] 831.(ACOX) 
Acyl-CoA oxidase 

[2054] This is a family of Acyl-CoA oxidases EC: 1 . 3.3.6. Acyl-coA oxidase converts acyl-CoA into trans-2-enoyl-CoA 
[IJ- 

Number of members: 39 

[2055] [1] Hayashi H. De Bellis L. Yamaguchi K, Kato A, Hayashi M. Nishimura M; Medline: 98192624. Molecular 
characterizatk)n of a glyoxysomal kxig chain acyl-CoA oxidase that is synthesized as a precursor of higher molecular 
mass in pumpkin.' J Biol Chem 1998;273:8301-8307. 
[2056] 832. (AlCARFTJMPCHas) 
AlCARFT/IMPCHase bienzyme 

[2057] This is a family of bifunctkxial enzymes catalysing the last steps in de novo purine biosynthesis. The bit unc- 
tional enzyme is found in both prokaryotes and eukaryotes. The second last step is catalysed by 5-aminoimkJazole- 
4-cartx)xamide ribonucleotide fomiyltransferase EC:2.1.2-3 (AlCARFT). this enzyme catalyses the formylation of AIC- 
AR with 10-formyl-tetrahydrofolate to yiekJ FAICAR and telrahydrofolate [1 J. The last step is catalysed by IMP (Inosine 
monophosphate) cyclohydrolase EC:3.5.4.10 (IMPCHase). cyclizing FAICAR (5-formylaminoimidazole-4-carboxamide 
ribonucleotide) to IMP [1]. 

NuTrt)er of members: 22 

[2058] 

[1) Akira T. Komatsu M. Nango R. Tomooka A. Konaka K, Yamauchi M, Kitamura Y. Nomura S. Tsukamoto 1; 
Medline: 97473523 Molecular cloning and expression of a rat cDNA encoding 5-aminoimidazole-4-cartx)xamide 
ribonucleotide formyltransferase/lMP cyclohydrolase' (published erratum appears in Gene 1998 Feb 27;208(2): 
337] Gene 1 997; 1 97:289-293. 

(21 Rayl EA, Moroson BA» Beardsley GP; Medline: 95147205 The human purH gene product. 5-aminoimidazole- 
4-cartx)xamide ribonucleotide fomnyttransf erase/1 MP cyclohydrolase. Ckxiing, sequencing, expression, purifica- 
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[11 

Sandal N.N.. Marcker K.A. 

Trends Biochem, Sci. 19:19-19(1994). 

[2| 

Smith F.W.. Havvkesford M J.. Prosser KM., Oarkson D.T 
Mol. Gen. Genet 247:709-715(1995). 



[2038] 825. TYA: TYA transposon protein 

Ty are yeast transposons, A 5. 7kb transcript codes for p3 a fuskxi protein of TYA and TYB. The TYA protein is anatogous 
to the gag protein of retroviruses. TYA a is cleaved to form 46kd protein which can form mature virion like particles (1 J. 
Number of members: 59 

[2039] (1] Medline: 97404699. Cryo-etectron mk:roscopy structure of yeast Ty retrotransposon virus-like particles. 
Palmer KJ. Tichelaar W. Myers N. Bums NR. Butcher SJ, Kingsman AJ. Fuller SO. Saibil HR: J Virol 1 997*71 • 
6863-6868. 

[2040] 826. AktolaseJI 

Class II Aktolase and Adducin N-terminal domain. 

This family vicludes dass II akiolases and adducins whch have not been ascribed any enzymatic function. Number 
of members: 37 



References: 



[2041] 



[1] Medline: 9329481 9. The spatial structure of the class II L-fucutose-1 -phosphate aldolase from Escherichia coli. 
Dreyer MK. Schub GE; J Mol Biol 1993;231:649-553. 

(2J Medline: 96256522. Catalytk: mechanism of the metal-dependent fucutose aldolase from Escherichia coli as 
derived from the stnjcture. Dreyer MK. Schuiz GE; J Moi Bk>l 1996;259:458-466. 

[2042] 827. CBD.2 



!- Two tryptophan residues are involved in celluk>se binding. 

I- Gellutose binding domain found in bacteria. Number of members: 51 



References: 



[2043] [1] Medline: 95284032, Solution structure of a cellulose-binding domain from Cellulomonas fimi by nuclear 
magnetic resonance spectroscopy Xu GY. Ong E. Gilkes NR. Kilbum DG, Muhandiram DR. Harris-Brandts M. Cancer 
JP. Kay l_E. Harvey TS; Bk^chemistry 1995;34:6993-7009. 
[2044] 828. P 

A unique feature of the eukaiyotic subtHisin-like proprotein convertases is the presence of an additional highly con- 
sented sequence of approximately 150 reskiues (P domain) kjcated immediately downstream of the catalytw domain. 
Number of members: 91 



References: 



[2045] 



[1] Medline: 94252314. A C-terminal donnain consented in precursor processing proteases is required for intramo- 
lecular N-temiinal maturatkxi of pro-Kex2 protease. Gluschankof P. Fuller RS; EMBO J 1994;13:2280-2288. 
[2] Medline: 98225190. Regulatory roles of the P domain of the subtitisin-fike prohonnone convertases. Zhou A. 
K/lartin S. Lipkind G. IjaMendola J. Steiner DF; J Bbl Chem 1998;273:11107-11114. 

[2046] 829. Uncharacterized protein family UPF0020 signature 
PROSITE cross-reference(s): PS01261; UPF0020 

The following uncharacterized proteins have been shown [1] to share regions of similarities: 

- Escherichia coli hypothetk:al protein ycbY and HI01 1 6/1 5. the corresponding Haemophilus influenzae protein. 



EP 1 Odd 405 A2 



References 
[2071] 
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[2072] 835. (Asp.Glu.race) 

Aspartate and glutamate racemases signatures 

[2073] Cross-reference(s) PS00923; ASP_GLU_RACEMASE_1 PS00924; 
ASP_GLU_RACEMASE_2 

[2074] Aspartate racemase (EC 5. 1 . 1 .1 3) and glutamate racemase (EC 5. 1 . 1 .3) are two evolutionary related t>acterial 

enzymes that do not seem to require a cofactor for their activity [1]. Glutamate racemase, which Interconverts L-gluta- 

mate into D-glutamate, is required for the bk^synthesis of peptdoglycan and some peptkie-based antibk^tics such as 

gramicidin S. In addition to characterized aspartate and glutamate racemases, this family also includes a hypothetcal 

protein from Enwinia carotovora and one from Escherichia coli (ygeA). Two conserved cysteines are present in the 

sequence of these enzymes. They are expected to play a role in catalytic activity by acting as bases in proton abstraction 

from the substrate. Signature patterns were developed for both cysteines. 

[2075] Consensus pattern: [I VAHU VM]-x-C-x(0, 1 )-N4ST]-[MSAHSTH]-(U VFYSTANKJ 

Consensus pattem: [UVM](2)-x-IAG]-C-T-{DEHHUVMFYHPNGRS)-x-[LIVMl 

[2076] [ 1) Gallo K.A.. Knowles J.R.. Biochemistry 32:3981-3990(1993). 

[2077] 836. (ATP-sulfuryiase) 

ATP-sulfurylase 

[2078] This family consists of ATP-sutf urylase or sulfate adenylyltransferase EC:2.7.7.4 some of whk:h are part of a 
bifunctionat polypeptkfe chain associated with adenosyl phosphosulphate (APS) kinase APS.klnase. Both enzymes 
are required for PAPS (phosphoadenoslne-phosphosulfate) synthesis from inorganic sulphate (2J. ATP sulfurylase 
catalyses the synthesis of adenosine-phosphosullate APS from ATP and inorganic sulphate (1]. 

Number of members: 37 

[2079] 

[1] Kurima K, Warman MU Krishnan S, Domowicz M. Krueger RC Jr. Deyrup A, SchwarU NB; Medline: 98337975 
A member of a family of suflate-activating enzymes causes murine brachymorphism" [published erratum appears 
in Proc Natl Acad Sci U S A 1998 Sep 29;95(20):12071 J Proc Natl Acad Sci U S A 1998;95:8681-8685. 
[2] Rosenthal E, Leustek T; Medline: 96096529 A multifunctional Urechis caupo protein. PAPS synthetase, has 
both ATP sulfurylase and APS kinase activities.' Gene 1995:165:243-248. 

[2080] 837. {ATP-synt_F) 
ATP synthase (F/14-kDa) subunit 

[2081] This family includes 1 4-kOa subunit from vATPases [1]. whk;h is in the peripheral catalytk: part of the complex 
[2]. The family also includes archaebacterial ATP synthase subunit F [3]. 

Number of members: 23 
[2082] 

[1 ] Guo Y. Kaiser K, Wieczorek H. Dow JA; Medline: 9626941 1 The Drosophila melanogaster gene vhal4 encoding 
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tion. kinetic analysis, and domain mapping.' J Biol Chem 1 996;271 :2225-2233. 

[2059] 833. (AOX) 
Alternative oxidase 

[2060] The altennative oxidase is used as a second temfiinai oxidase in the nrutochondria, electrons are transfered 
directly from reduced ubiquinol to oxygen forming water [2). This is not coupled to ATP synthesis and is not inhibited 
by cyanide, this pathway is a single step process [1 ]. In rice the transcript levels of the alternative oxidase are increased 
by low temperature [1]. 

Number of members: 27 

[2061] 

[1] Ito Y. Salsho D. Nakazono Tsutsumi N. Hirai A; Me6\\ne: 98086211 Transcript levels of tandem-arranged 
alternative oxklase genes in rice are increased by k>w temperature." Gene 1997;203:121-129. 

[2] U Q. Ritzet RG, McLean LL, Mcintosh U Ko T, Bertrand H. Nargang FE; Medlme: 9636641 3 CkMikig and analysis 
of the alternative oxidase gene di Neurospora crassa.' Genetics 1996;142:129-140. 

[2062] 834. (APH) 

Protein kinases signatures and profile 

[2063] Cross-reference(s): PS00107; PROTEIN.KINASE.ATP. PS00108; 
PROTEIN_KINASE«ST, PS00109; PROTEIN.KINASE_TYa PS50011; 
PROTEIN_KINASE_DOM 

[2064] EukaryotK protein kinases [1 to 5] are enzymes that belong to a very extensive family of proteins which share 
a conserved catalytic core comnrwafi to both serine/threonine and tyrosine protein kinases. There are a number of 
conserved regions in the catalytk: domain of protein kinases. Two of these regkxts have been selected to buiki signature 
patterns. The first regk)n, which is bcated in the N-terminal extremity of the catalytto domain, is a giyctne-rich stretch 
of reskiues in the vicinity of a lysine rescue, which has been shown to be involved in ATP binding. The second regkxi. 
which IS located in the central part of the catalytk: domaki, contains a conserved aspartic ackJ residue which is important 
for the catalytic activity of the enzyme [6]; two signature patterns were derived for that regton: one specific for serine/ 
threonine kinases and the other for tyrosine kinases. A profile was developed which is based on the alignment in [1] 
and covers the entire catalytic domain. 

[2065] Consensus patlem: [UVI^3^P)-G^PHPYVVMGSTNH]HSGA]-{PVV)-(LIVGATJ-{PD}-x- [GSTACUVMFYJ-x 
(5.18HUVMFYWCSTARHAIVPJ-IUVMFAGCKR1-K [K binds ATP] 

[2066] Sequences known to bekxig to this class detected by the patlem the majority of known protein kinases but it 
fails to find a number of them, especially viral kinases which are quite divergent in this regkxi and are completely 
missed by this pattern. 

[2067] Consensus pattern; pj VMFYCJ-x4HYhx-D-(UVMFY]-K-x(2)-N-[UVMFYCT](3) [D is an active site residue] 
[2068] Sequences known to bekxig to this class detected by the pattem. Most serine/ threonine specific protein 
kinases with 1 0 exceptkxis (half of them viral kinases) and also Epstein-Barr vims BGLF4 and Drosophila ninaC which 
have respectively Ser and Arg instead of the consen/ed Lys and which are therefore detected by the tyrosine kinase 
specific pattem descr&>ed bek>w. 

[2069] Consensus pattem: {UVH4FYC]-x-{HY]-x-D-(UVMFYl-(RSTAC]-x(2)-N-IUVMFYC](3) [0 is an active site res- 
kiue] tyrosine specific protein kinases with the exceptkxi of human ERBB3 and mouse bik. This pattem will also detect 
most t>acterial aminoglycoskie phosphotransferases [8.9] and herpesviruses ganciclovir kaiases [10]; whk:h are pro- 
teins structurally and evolutionary related to protein kinases. Sequences known to bekxig to this class detected by the 
profile AU_ except for three viral kinases. This profile also detects receptor guanylate cyclases (see <P0OC00430>) 
and 2-5A-dependent rilxxiucleases. Sequence similarities between these two families and the eukaryotic protein kinase 
family have been noticed before, it also detects Arabkiopsis thaliana kinase- like protein TMKL1 whk:h seems to have 
lost its catalytic activity. 

[2070] Note if a protein analyzed includes the two protein kinase signatures, the probability of it being a protein kinase 
is ck>se to 100%. Note eukaryotic-type protein kinases have also been found in prokaryotes such as Myxococcus 
xanthus (11) and Yersinia pseudotuberculosis. Note the patterns shown above has been updated since their publication 
in [7]. Note this documentation entry is linked to both signature patterns and a profile. As the profile is much more 
sensitive tfian the pattems. you should use it if you have access to the necessary software tools to do so. 
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[2094] Consensus pattern: N-x-D-G-S-x(4)-C-G-N-[GAl-x-R (C is an active site residue] Sequences known to belong 
to this class detected by the pattern ALL, except for an Anabaena dapF which has a Ser instead of the active site Cys. 
[2095] ( 1] Cirilli 1^.. Zheng R. Scapin G., Blanchard J.S., Biochemistry 37:16452-16458(1998). 
[2096] 842. (DNA_gyraseB_C) 
DNA topotsomerase II signature 

[2097] Cross-reference(s) PS001 77; TOPOISOMERASEJI 

DNA topoisomerase I (EC 5.99.1.2) (1.2,3,4,E11 is one of the two types of enzyme that catalyze the interconversion 
of topological DNA isomers. Type 11 topoisomerases are ATP-dependent and act by passing a DNA segment through 
a transient double-strand break. Topoisomerase II is found in phages, archaebacteria« prokaryotes. eukaryotes, and 
in Afrfcan Swine Fever virus (ASF). In bactenophage T4 topotsomerase II consists of three subunits (the product of 
genes 39. 52 and 60). In prokaryotes and in archaet>acteria the enzyme, known as DNA gyrase, consists of two subunits 
(genes gyrA and gyrB [E2]). In some bacteria, a second type II topoisomerase has t>een jdentifled; it is known as 
topoisomerase IV and is required for chromosome segregation, it also consists of two subunits (genes parC and parE). 
In eukaryotes, type II topoisomerase is a homodimer. 

[2098] There are many regions of sequence homok>gy between the different subtypes of topoisomerase II. The 
relation between the different subunits is shown in the folk^wing representatkxi: 



< About- 1 400-residues > 



[ Protein 39-* ][ — Protein 52—1 Phage T4 

[ gyrB ♦ ][ gyrA ] Prokaryote II 

Archaebacteria 

[ parE ♦ ][ parD ] Prokaryote IV 

( * J Eukaryote and 

ASF 

Position of the pattern. 

[2099] As a signature pattern for this family of proteins, a region that contains a highly conserved pentapeptide was 
selected. The pattern is kx:ated in gyrB, in parE. and in protein 39 of phage T4 topoisomerase. 
[2100] Consensus pattern: IUVMA]-x-E-G-[DNl-S-A-x-lSTAG] 

1 1] Sternglanz R.. Curr, Opin. Cell Biol, 1:533-535(1990), 

{ 2] Bjomsti M.-A., Curr, Opin. Struct Biol. 1:99-103(1991). 

1 3] Shamia A.. Mondragon A, Curr. Opin. Struct. Biol. 5:39-47(1 995). 

[ 4] Roca J., Trends Biochem. Sci. 20:156-160(1995). 

[2101] 843. (DUF16) 
Protein of unknown function 

[2102] The function of this protein is unkrK>wn. It appears to only occur in Mycoplasma pneumoniae. 
Number of members: 26 

[2103] (1 ] Himmelreich R. Hilbert H, Plagens H. PirkI E. Li BC. Herrmann R; Medline: 97105885 Complete sequence 
analysis of the genome of the bacterium Mycoplasma pneumoniae.* Nucleic Acids Res 1996;24:4420-4449. 
[2104] 844. (DUF21) 
[2105] Domain of unknown function 

[2106] This transmembrane region has no known function. Many of the sequences in this family are annotated as 
hemolysins, however this is due to a similarity to Swiss:Q54316 that does not contain this domain. This domain is 
found in the N-terminus of the proteins adjacent to two intracellular CBS domains CBS. 
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a 14-kDa F-subunit ol the vacuolar ATPaso." Gene 1996;172:239-243 

T^'^- Stone DK. Medline: 96216416 Identification of a 14-kDa submit associated 
wi h the catalytK: sector of clathrin<«ated vesicle H^ATPase." J Bid C»,em 1996^71 -3324-3S7 
13J Wilms R. Freiberg C. Wegerle E. Meier I, Mayer F. Muller V: Medline: 96324968 Subunit structure and oroan- 
'^^^^o^ '^'^ ''^^ ^^«hanosarc•|na mazel Gol.- J Bio. SZ^^S^v. 



18843-18852. 

£2083] 838. (CBD_4) 
Starch binding domain 

Number of members: 48 

[2084] 839,(CbiX) 



S f u]i?ii" tS^el^b^^^^^^^ ' ^ '^""^ ^^^^ tlos^esis operons and so may have a 

ZZ^2JT:^^^^^^ ^ ^^'-^ '^'""^^ ^ their Ctem^inus. «^ich suggei that it 



might be involved in metal chelation Jl J. 
Number of menriiers: 6 



[2086] [1] Fteux E. Lanois A, Vterren 1^. Rambach A. Thermes C; Medline: 98416126 Cobalamin fvitamin B12> 
b.osynthes«: Klentification and characterization of a Bacillus megaterium cobi open«.- Bioch^nSSS-^lI 



840. (Coniplex1_51K) 



'^xSl^-^^SS^^r^r " ^'"'^ Oross.eference(s, PS00644: 

SL2lt??2f1f dehydrogenase (EC 1.6.5.3) [1.2] (also known as oonptex I or NADH-ubk,uinone 

to e^ SJ^S i^Lltrr " *^ '^^ mitochondrial mLbrane whk*, als?s^s 

to exist n the chloroplast and in cyanobacteria (as a NADH-plastoquinono oxkJoreductase) Amona the 25 to 30 

[2089] The 51 Kd subunit is highly similar to [3.4J: 

* anTali^l c1^'''"^ NAD-reducing hydK,genase (gene hoxF) whteh also bhds to NAD. FMN. 

- Subunit NQ01 of Paracoccus denitrTicans NADHnibiquinone oxidoreductase 

- Subunit F of Escherichia coli isiADH-ubiquinone oxidoreductase (gene nuoF). 

£2091] Consensus pattern: G-IAMJ-G4AR]-Y-(UVM]-C-G-IDE](2)4STAl(2HUM](2)-rENl- S 
Consensus pattern: E^-(>G-x-C-x-P-C-R-x-G fThe three C's are putative 2Fe-2S ligandsj 

1 1J Ragan C.I.. Curr. Top. Bioenerg. 15:1-36(1987). 

1 21 Weiss H., Friedrich T. Hofhaus &. Preis D., Eur. J. Biocheia 197:563-576(1991) 

[ 3) Feamley I.M.. Walker J.E. Biochim. Biophys. Acta 1140-105-134(1992) 

( 4J Wekmer U.. Geier S.. Ptock A. Friedrich T. Leif H.. Weiss H.. J. Mol. Biol. 233:109-122(1993). 

I2092J 841. (DAP_epimerase) 

Diaminopimelate epimerase signature 

[2093] Cross-ref erence(s) PS0 1 326; OAP_EPI MERASE 

Diaminopimelate epimeiase (EC 5.1.1.7) catalyzes the isomeriazatkx, of UL- to D L-meso^liaminoDimetaie in .h« 
b««y,,thet«pathway leading from aspartate tolys,ne.Thisenzymefeapro.einoIa^^ 

seem imo functton as the add and base in the catalytic mechanism. As a signature S«l^e^a^fuSiuSn 
the first of these two active site cysteines were selected. « pa«em. me region surrounding 



4 
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1. 
f'. 



10 



IS 



20 



25 



30 



diffusion channels that allows any solutes up to a certain size (that size is known as the exclusion limit) to cross the 
membrane, while other porins are specific for a solute and contain a binding site for that solute inside the pores (these 
are known as selective porins). As porins are the major outer membrane proteins, they also serve as receptor sites 
for the binding of phages and bacteriocins. General diffusion porins generally assemble as trimer in the membrane 
and the transmembrane core of these proteins is composed exclusively of beta strands {2]. It has been shown [3] that 
a number of general porins are evolutionary related, these porins are: 

Enterobacteria phoE. 
Enterobacteria ompC. 
Enterobacteria ompF. 
Enterobacteria nmpC. 
- Bacteriophage PA-2 LC. 
Neisseria PI .A. 
Neisseria PLB. 

[2125] As a signature pattern a conserved region was selected, located in the C-terminal part of these proteins, 
which spans two putative transmembrane beta strands. 

[21 26] Consensus pattern: (U VMFYl-x(2)-G-x(2)-Y-x-F-x-K-x(2)-[SN]-(STAV]HUVMFYWJ-V 

(1) Benz R,. Bauer K., Eur. J. Biochem. 176:1-19(1988). 

[2] Jap B.K., Walian PJ., Q. Rev. Biophys. 23:367-403(1990), 

[3] Jeanteur O.. Lakey J.H.. Pattus F.. Mol. l^icrobioL 5:2153-2164(1991). 

[2127] 851, (HlyD) 

HtyD family secretion proteins signature 

[2128] Cross-reference(s) PS00543: HLYD_FAMILY 

Gram^negative bacteria produce a nunr)ber of proteins which are secreted into the growth medium by a mechanism 
that does not require a cleaved N-terminal signal sequence. These proteins, while having different functkxis, require 
the help of two or more proteins for their secretkxi across the cell envelope. Amongst which a protein bek>nging to the 
ABC transporters family (see the relevant entry <PDCX)00185>) and a protein belonging to a family which is currently 
composed (1 to 5] of the following members: 



3S 



40 



4S 



Gene 


Species 


Protein which is exported 


hlyD 


Escherichia coli 


Hemolysin 


appD 


A.pteuropneumoniae 


Hennolysin 


IcnD 


Lactococcus lactis 


Lactococcin A 


IktD 


A.actinomycetenfKx>mitans 


Pasteurelta haemolytk^a Leukotoxin 


rtxO 


A.pleuropneumoniae 


Toxin-ill 


cyaD 


Bordetella pertussis 


CalmoduIin*sensitlve adenylate cyclase-hemolysin(cyclotystn 


cvaA 


Escherichia coli 


Colicin V 


prtE 


Erwinia chrysanthemi 


Extracellular proteases B and C 


aprE 


Pseudomonas aeruginosa 


Alkaline protease 


emrA 


Escherichia coli 


Drugs and toxins 


yjcR 


Escherichia coli 


Unknown 



so 



These proteins are evoiutkxiary related and consist of from 390 to 480 amino ackj residues. They seem to be ancfiored 
in the inner membrane by a N-terminal transmembrane region. Their exact role in the secretion process is not yet 
known. The C-terminal sectk>n of these proteins is the best conserved region; a signature pattern from ttiat region was 
derived. 

[2129] Consensus pattern: (LIVMl-x(2)-G-[UVI]-x(3)-[STGAVl-x-lLlVMTl-x-(UVI^-{GE]-x-{KR]-x-(UVMFYWl(2hx- 
(L1VMFYW](3) 

Sequences known to bek^ng to this class detected by the pattern ALL, except for emrA and yjcR. 



ss 
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Number of members: 42 



[21071 845. PUF56) 
[2108] Integral membrane protein 



[2109] Tlie members of this family are putative integral membrane proteins. The function of the family is unknown 
howeverthefamily includes Sec59fromyeastSec59isad^^ 

activity resides in this region or its N terminal reaion «"^yma«c 



activity resides in this region or its N terminal region 

Number of members: 1 3 

[2110] 846. (DUF94) 

[2111] Domain of unknown function 

(21 12] The functkHi of this donnain is unknown. It is found In both eukaryotes and archaebacteria The aliqnment 
containsacompletelyconseived aspartate residue m^^ 

Number of members: 9 

[2113] 847. (FF) 
[21141 FF domain 

[2115] This domain may be involved in protein-protein interactk)n m 
Number of members: 42 

^. 99322199 The FF domain: a novel motif that often accompanies WW 

domains.' Trends Biochem Sd 1999;24 264-265 c»w«mpanies ww 

[21171 848. (FLO_LFY) 

Floricaula / Leafy proten 

[f ™* family consists of various plant development proteins which are homologues of floricaula (FLO) and Lea^ 

Number of members: 16 
[2119] 

IIL?!^' ^^^^ ^"^^ ^' Matthews R Mchaet A. Ellis N; Medline: 97411151 UNIFOUATA 

regulates leaf and ftower morphogenesis in pea.' Curr Biol 1997 7 581-587 uini^uuaia 

[2] Weigel D. Alvarez J. Smyth DR. Yanofsky MF. Meyerowitz EM; Medline: 92274452 LEAFY controls ftoral mer- 
istem Identity in Arabkiopsis." Cell 1 992;69:843«9. conirois iiorai mer- 

[2120] 849. (G-patch) 
G -patch domain 

E!5^^ is found in a number of RNA binding proteins, and is also found in proteins thai contain RNA 

Number of members: 47 

?JS1 ^'l'^*^ ^ **^' « "«wconse(ved domain in eukatyotic RNA-processino 

protems and type D retroviral pol/proteins.- Trends Biochem Sci 1 999 24 342-344 "WA-procesang 

(2123] 850. <Gram-ve_porins) 

General diffusion Gram-negative porins signature 

[2124] Cross-reference(s) PS00576; GRAM_NEG_PORIN 

The outer membrane of Gram-negative bacteria acts as a molecular filter for hydrophilic compounds Proteins known 
^po«^,U are r^«blefor the -molecularsieve- p^^^^^ 

Channels whch allows the diffusion of hydrophilic molecules into the periplasmic space. Some porhs Z 
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Major outer membrane lipoprotein (murein-lipoproteins) (gene Ipp). 
Escherichia coll lipoprotein-28 (gene nlpA). 
Escherichia coli lipoprotein-34 (gene nlpB). 
Escherichia coti lipoprotein nipC. 
Escherichia cofi lipoprotein nIpO. 

- Escherichia coli osmotically inducible lipoprotein B (gene osmB). 
Escherichia coli osrrxjtically hducible lipoprotein E (gene osmE). 
Escherichia coli peptidogtycan-associated lipoprotein (gene pal). 
Escherichia coli rare lipoproteins A and B (genes rplA and rplB). 
Escherichia coli copper homeostasis protein cutF (or nIpE). 

- Escherichia cdi plasmids traT proteins. 
Escherichia coli Col plasmids lysis proteins. 
A number of Bacillus beta-tactamases. 

Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). 
Borrelia burgdorferi outer surface proteins A and B (genes ospA and ospB). 

- Borrelia hemnsii variable major protein 21 (gene vmp21 ) and 7 (gene vmp7). 
Chlamydia trachomatis outer membrane protein 3 (gene omp3). 
Fibrobacter succinogenes endoglucanase cel-3. 

Haemophilus influenzae proteins Pal and Pep, 
Klebsiella pullulunase (gene putA). 
Klebsiella pullulunase secretion protein pulS. 
Mycoplasma hyorhinis protein p37. 

- Mycoplasma hyorhinis variant surface antigens A, B, and C (genes vIpABC). 
Neisseria outer membrane protein H.8. 

Pseudomonas aeruginosa lipopeptide (gene IppL). 
Pseudorrx>nas solanacearum endoglucanase egl. 

Rhodopseudomonas viridis reaction center cytochrome subunit (gene cytC). 
Rickettsia 1 7 Kd antigen. 

- Shigella fiexneri invasion pfasmid proteins mxU and mxiM. 

- Streptococcus pneumoniae oligopeptide transport protein A (gene amiA). 
Treponema pallidium 34 Kd antigen. 

Treponema pallidium membrane protein A (gene tmpA). 
Vibrio harveyi chitobiase (gene chb). 

- Yersinia virulence plasmid protein yscJ. 

- Halocyanin from Natrobacterium pharaonis [4], a membrane associated copper-binding protein. This is the first 
archaebactenal protein known to be modified in such a fashion). 

[2140] From the precursor sequences of all these proteins, a consensus pattern and a set of rules to kientify this 
type of post-translational modification were derived. 

12141) Consensus pattem: {DERK}(6HUVMFWSTAG)(2HUVMFYSTAGCQHAGS)-C [C is the lipid attachment 
site] Additk)nal rules: 1) 

[2142] The cysteine must be between positions 1 5 and 35 of the sequence in consideration. 2) There must be at 
least one Lys or one Arg in the first seven positions of the sequence. Sequences known to belong to this class detected 
by the pattern ALL Other sequence(s) detected in SWISS-PROT some 100 prokaryotic proteins. Some of them are 
not membrane lipoproteins, but at least half of them could be. 

References 

[2143] 

[1] Hayashi S.. Wu H.C.. J. Bioenerg, Bk)membr. 22:451-471(1990). 
[2] Klein P., Somorjai R.L. l^u RCK., Protein Eng. 2:15-20(1988). 
[3] von Heijne G., Protein Eng. 2:531-534(1989). 

[4] Matlar S., Scharf B., Kent S.B.H.. Rodewald K.. Oesterhelt D.. Engelhard M. J. Biol. Chem. 269:14939-14945 
(1994). 



[2144] 856. (Upoprotein.7) 
Adhesin lipoprotein 
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References: 
[2130] 

[1] Gilson L, Mahanty H.K., Kolter R. EMBO J. 9:3875-3884(1990). 

12] Letoffe S.. Delepeiaire P.. W&ndersman C. EMBO J. 9: 1 375-1 382(1 990). 

f3J ^oddardG.W.. PetzelJ.R. van Belkum M.J. Kok J.. McKay LL. AppL Environ. Microbiol. 58:1952.1961(1992) 
[4] Duong R. Lazdunski A.. Cami B.. Murgier M. Gene 121:47-54(1992). 
15J Lewis K., Trends Biochera Sci. 19:119-123(1994). 

[2131] 852, (IBR) 
In Between Ring fingers 

[2132] The IBR (In Between Ring fingers) domain is found to occur between pairs of ring fingers (2f-C3HC4) The 
fin^ rTnted) dcS^"^^^^ unknown. This domain has also been called the C6HC dornain and DRIL (for double RING 

Number of members: 25 
[2133] 

mMo^^^ E. Bork P. Mediine: 10366851 A novel transactivation domain in parkm.'Trends Biochem Sci 1999;24: 



[2] van der Reijden BA. Erpelinck-Verschueren CA. Lowenberg B. Jansen JH; Medline: 99349709 TRIADS' 
class of proteins with a novel cystelne-rich signature." Protein Sci 1999;8: 1557-1 561. 



anew 



[2134] 853. (I PPT) 
IPP transferase 

L'iSll^,:^!^ ^ "^T* ^ ^ Sasaka«a C; Medline: 97440126 The modffied nucleoside 

^TS^iiTSTJ^'^''^'"^ *^ expression of virulence genes 

J Bactend 1997;179:5777-5782. 

^^}!!!^ ^' "f"*®' ^ ^'"^ NC. Hopper AK; Medline: 94187700 Subcellular locations 

o( MODS proteins: mapping of sequences suTKJient for targeting to mitochondria and demonstration that mito- 
cnonarial and nuclear isofonns commingle in the cytosol' Mol Cell Biol 19941 4-2298-2306 
(3] Gillman EC Slusher LB. Martin NC. Hopper AK; Medline: 91203856 MOOs'translation initiation sites detemiine 
N6-isopentenyladenosine modification of mitochondrial and cytoplasmic tRNA.' Mol Cell Biol 1 991 ;ii :2382-2390. 

P1351 854. (KE2) 
KE2 family protein 

Sg le^:i"SSlri2r °' ' ^ ^"^""^ '^^ ' ""^^ 

Number of memt)ers: 9 

[2137] 

1I2K r^i?.Gri£^;s?^^''' '^^^ 

Sf^U!!:.^"'® ^" ^=^29859 YKE2. a yeast nuclear gene encoding a protein 

showing homology to mouse KE2 and containing a putative leucine-zipper motif.' Gene 1994;151:197-201 

[2138] 855. (Lipoprotein_6) 

Prokaiy(rtic memtxane Iqxsprotein lipid attachment site 

{2139J Cross-reference(8) PS00013; PROKAR_UPOPROTEIN 

Inproteryotes. membrane lipoproteins are synthesized with a precursor signal peptide, which is cleaved by a specific 
hpoprote« signal pep Ktese (signal peptidase II). The peptidase recognizes a consented sequence and cuts uj^m 
of a cysteine residue to which a glyceride^atty acid lipid is attached HJ. Some ol the proteins known to undergo such 
processing currently delude (for recent listings see [1.2.3]): lo unoergo sucn 
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Number of members: 23 
12157] 

5 [1] Wu p. Brockenbrough JS. Metcalfe AC, Chen S, Aris JP; Medline: 98298165 Nop5p is a small nucleolar ribo- 

nuclebprotein component required for pre- IBS rRNA processing In yeast." J Biol Chem 1998:273:16453-16463. 
[2] Gautier T. Berges T, Tollervey D, Hurt E;Medline: 8038777 Nucleolar KKE/D repeat proteins Nop56p and Nop56p 
interact with Noplp and are required for ribosome biogenesis,* Mol Cell Biol 1997;17:7088-7098. 
^ [3] Weidenhammer EM, Singh M, Ruiz-Noriega M, Wooltord JL Jr; Medline: 96184869 The PRP31 gene encodes 

10 a novel protein required for pre-mRNA splicing in Saccharomyces cerevisiae.* Nucleic Acids Res 1996;24: 

1164-1170. 

[2158] 861. (Nramp) 

Natural resistance-associated macrophage protein 

IS The natural resistance-associated macrophage protein (NRAMP) family consists of Nramp 1 , Nfamp2, and yeast pro- 
teins Smf 1 and Smf2. The NRAMP family is a novel family of functional related proteins defined by a consented hy- 
drophobic core of ten transmembrane domains [5]. This fanruly of membrane proteins are divalent cation transporters. 
Nrampi is an integral membrane protein expressed exclusively in cells of the immune system and is recruited to the 
membrane of a phagosome upon phagocytosis (1 ]. By controlling divalent catbn concentrations Nrampi may regulate 

20 the interphagosomal replication of bacteria [1]. Mutations in Nrampi may genetically predispose an individual to sus- 
ceptibility to diseases including leprosy and tuberculosis conversely this might however provide protection form rheu- 
matoid arthritis [1]. Niamp2 is a multiple divalent cation transporter for Fe2+, Mn2-i' and Zn2-f amongst others it is 
expressed at high levels in the intestine; and is major transfenin-independent iron uptake system in mammals [1]. The 
yeast proteins Smf 1 and Smf2 may also transport divalent cations [3]. 

2S 

Number of memtxers: 36 
[2159] 

30 [1] Govoni G, Gros P; Medline: 98383996 Macrophage NRAMPI and its role in resistance to microbial infections. 

■ Inflamm Res 1 998;47: 277-284. 

[2] Agranoff DD. Krishna S Medline: 98294035 Metal ion homeostasis and intracellular parasitism.* Mol Microbiol 
1998;28:403-412. 

[3] Pinner E. Gruenheid S. Raymond M. Gros P; Medline: 98030569 Functional complementatbn of the yeast 
35 divalent catkxi transporter family SMF by NRAMP2, a member of the mammalian natural resistance- associated 
macrophage protein family." J Biol Chem 1997;272:28933-28938. 

(4] Cellier M, Belouchi A. Gros P; Medline: 96402487 Resistance to intracellular infections: comparative genomic 
analysis of Nramp.' Trends Genet 1 996;1 2:201 -204. 

15] Cellier M. Prive G. Betouchi A. Kwan T, Rodrigues V, Chia W, Gros P; Medline: 96036029 Nramp defines a 
^ family of membrane proteins.* Proc Natl Acad Sci. U S A 1 995;92: 1 0089-1 0093. 

[2160] 862. (NTP_tfansf_2) 
Nucleotidyltransferase domain 

Members of this family belong to a large family of nucleotkiyltransferases (1]. 

4$ 

Number of members: 83 

[2161] [1] Holm U Sander C; Medline: 96005605 DN A polymerase beta belongs to an ancient nucleotkiyltransf erase 
superfamily * Trends Biochem Sci 1995;20:345-347. 
so [2162] 863 (Paramyxo_P) 

Paramyxovirus P phosphoprotein 

[2163] This family consists of paramyxovirus P phosphoprotehi from sendai virus and human and bovine parainflu- 
enza viruses. The P protein is an essential part of the viral RN A polymerase complex formed form the P and L proteins 
(IJ. The exact role of the P protein in this complex in unknown but it is involved in multiple protein -protein interactkms 
ss and binding the polymerase complex to the nucleocapsid or ribonucleoprotein template [1]. It also appears to be im- 
portant for the proper folding of the L protein [1]. The paramyxovinjses have a negative sense ssRNA genome (1 ]. 
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[21 45] This family consists of the p50 and variable adherence-associated antigen (Vaa) adhestns from Mycoplasma 
hominis. M. hominis is a mycoplasnra associated with human urogenital diseases, pneunxxiia. and septic arthritis [1]. 
An adhesin is a cell surface molecule that mediates adhesion to other cells or to the surrounding surface or substrate. 
The Vaa antigen is a 50-kDa surface lipoprotein that has four tandem repetitive DNA sequences encodffig a periodic 
peptide structure, and is highly immunogenic in the human host l^l p50 is also a 50-kOa lipoprotein, having three 
repeats A.B and C. that may be a tetramer of 191-kOa in its native environment [2]. 

Number of memt>ers: 18 

[2146] 

[ 1 ] Zhang Q, Wise KS; Medline: 96294768 Molecular basis of size and antigenic variation of a Mycoplasma hominis 
adhesin encoded by divergent vaa genes. Infect Immun 1996:64:2737-2744. 

[2] Henrit* B, Kitzerow A. Feldniann RC, Schaal H. Hadding U; Mediae: 97047675 Repetitive elements of the 
Mycoplasma hominis adhesin p50 can be differentiated by monoclonal antibodies." Infect Immun 1996'64- 
4027-4034. 

[2147] 857. (MaoCJike) 
MaoC like domain 

[21 48] The MaoC protein is found to share similarity with a wide variety of enzymes; estradiol 1 7 beta-dehydrogenase 
4, peroxisomal hydralase-dehydrogenase-epimerase. fatty ackJ synthase beta subuniL All these enzymes contain other 
domains. This domain is also present in the NodN nodulatkxi protein N. No specific function has been assigned to this 
region off any of these proteins. The maoC gene Is part of a operon with maoA which is involved in the synthesis of 
monoamine oxidase [1]. 

Number of members: 46 

[2149] (1 J Sugino H. Sasaki M, Azakami H. Yamashita M. Murooka Y Medline: 96235221 A monoamine-regulated 
Klebsiella aerogenes operon containing the monoamine oxkJase structural gene (maoA) and the maoC gene ' J Bac- 
teriol 1992;174:2485-2492. 
[2150] 858. (MSP) 

Manganese-stabilizing protein / photosystem 11 polypeptide 

[2151] This family consists of the 33 KDa photosystem II polypeptide from the oxygen evolving complex (OEC) of 
plants and cyanobacteria. The protein Is also known as the manganese-stabilizing protein as it is associated with the 
manganese complex of the OEC and may provide the ligands for the complex [1]. 

Numberof members: 17 

[2152] [1] Philbrfck JB, Zilinskas BA; Medline: 88334494 "Cloning, nucleotide sequence and mutaltonal analysis of 
the gene encoding the Photosystem II manganese-stabilizing polypeptkJe of Synechocystis 6803." Mol Gen Genet 
1988;212:418-425. 
[2153] 859. (NAC) 

[2154] [1] Makarova KS. Aravind L. Galperin MY, Grishin NV, Tatusov RL, Wolf Yl. Koonin EV. Medline: 99342100 
Comparative genomics of the Archaea (Euryarchaeota): evolution of consen/ed protein families, the stable core, and 
the variable shell." Genome Res 1 999;9:608-628. 

Number of members: 27 

[2155] 860. (Nop) 

Putative snoRN A binding domain 

[21 56] This family consists of various Pre RNA processing ribonucleoproteins. The functk>n of the aligned regkxi is 
unknown however it may be a corrvnon RNA or snoRNA or Nopip binding domain. Nop5p (Nop58p) Swiss:Q12499 
from yeast is the protein component of a ribonucleoprotein protein required for pre-18s rRNA processing and is sug- 
gested to function with Nopip in a snoRNA complex (1). Nop56p SwissX)00567 and NopSp interact with Nopip and 
are required for ribosome biogenesis [2J. Prp31p Swiss:p49704 is required for pre-mRNA splicing in S. cerevisiae [3]. 



EP 1 033 405 A2 



and delta 1- pyrroline-5-carboxyIato dehydrogenase domains of the multifunctional Escherichia coli PutA protein J 
Mol Biol 1994;243:950-956. 
[21 75] 868. (PsbP) 

[2176] This family consists of the 23 kDa subunrt of oxygen evolving system of photosystem II or PsbP from various 
plants (where it is encoded by the nuclear genome) and Cyanobacteria. The 23 KDa PsbP protein is required for PSII 
to be fully operational in vivo, it increases the affinity of the water oxidation site for CI* and provides the conditions 
required for high affinity binding of Ca2-i- (2]. 

Number of members: 25 

[2177] 

[1J Rova EM, Mc Ewen B. Fredriksson PO, Styring S; Medline: 97067138 Photoactivatbn and photoinhibitran are 
competing *n a mutant of Chlamydomonas reinhaidtii lacking the 23-kOa extrinsic subunit of photosystem II.' J 
Biol Chem 1996;271:28918-28924. 

[2] Kochhar A. Khurana JP. Tyagi AK; Medline: 97191536 Nucleotkje sequence of the psbP gene encoding pre- 
cursor of 23-kOa polypeptide of oxygen-evolving complex in Arabtdopsis thaliana and its expresskxi k\ the wild- 
type and a constitutively photomoiphogenc mutant.' DNA Res 1996;3:277-285. 

[2178] 869. (PUA) 

[2179] The PUA domain named after PseudoUrkJine synthase and Archaeosine transglycosylase, was detected in 
archaeal and eukaryotic pseudouridine synthases, archaeal archaeosine synthases, a family of predated ATPases 
that may be involved in RNA modificatwn, a family of predated archaeal and bacterial rRNA methylases. Additbnally, 
the PUA domain was detected in a family of eukaryotic proteins that also contain a domain homologous to the translation 
initiation factor elFI/SUII ; these proteins may comprise a novel type of translation factors. Unexpectedly, the PUA 
domain was detected also in bacterial and yeast glutamate kinases; this is compatible with the demonstrated role of 
these enzymes in the regulation of the expression of other genes [1]. It is predkJted that the PUA domain is an RNA 
binding domain. 

Number of members: 48 

[2180] {1 ] Aravind L. Koonin EV; Medline: 991 931 78 Novel predicted RNA-binding domains associated with the trans- 
lation machinery.' J Mol Evol 1999;48:291-302. 
[2181] 870. (RF1) 
eRFI -like proteins 

[2182] Members of this family are peptide chain release factors. The eukaryotk; Release Factor 1 proteins (eRFI s) 
are involved in termination of translatbn. The eRFI protein is functfonal for all stop codons and appears to atxjiish 
read-through of these codons. This family also includes other proteins for which the precise molecular functbn is 
unknown. Many of them are from Archaebacteria These proteins may also be involved in translatkxi termination but 
this awaits experinnental verificatkxi. 

Number of members: 25 

[2183] 

11] Frolova L, Le Goff X. Rasmussen HH. Cheperegin S. Dmgeon G, Kress M, Amian I. Haenni AL. Celis JE. 
Philippe M, etal; Medline: 95082951 Ahighlyconsen^ed eukaryotic protein family possessing properties of polypep- 
tide chain release factor" [see comments] Nature 1994;372:701-703. 

[2] Dojgeon G, Jean-Jean O, Frolova L. LeGoff X, Philippe M. KIsselev L. Haenni AL; Medline: 97315314 Eukary- 
otic release factor 1 (eRFI) abolishes readthrough and competes with suppressor tRNAs at all three termination 
codons in messenger RNA.' Nucleic AckJs Res 1997;25:2254-2258. 

[2184] 871. (RibosomaLL14e) 
Ribosomal protein LI 4 

[2185] This family includes the eukaryotic ribosomal protein L14. 
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Number of members: 15 
[2164] 

[1] Bowman MC. Smallwood S. Moyer SA; Medline: 99329169 Dissection of Individual Functions of the Sendal 
Virus Phosphoprotein in Transcription." J Virol 1999;73:6474-648d. 

[2] Matsuoka Y, Curran J. Pelet T. Kolakofslcy D. Ray R. Compans RW; Medfine: 91 237868 The P gene of human 
parainHuenza virus type 1 encodes P and C proteins but not a cysteine-rich V protein." J Virol 1 991 ;65:3406-341 0. 

[2165] 864. (Patatin) 

[21 66] This family consists of various patatin glycoproteins from plants. The patatin protem accounts for up to 40% 
of the total soluble protein in potato tubers (2J. Patatin is a storage protein but it also has the enzymatic activity of lipid 
acyl hydrolase, catalysing the cleavage of fatty acids from membrane lipids [2]. 



Number of members: 21 



[2167] 



[1] BanfaM Z. Kostyal Z. Barta E; Medline: 95107249 Solanum brevidens possesses a non-sucrose-inducible 
patatin gene/ Md Gen Genet 1994;245:517-522. 

[2] Mignery GA. Pikaard CS. Park WD, Medline: 88226014 Molecular characterization of the patatin multigene 
family of potato." Gene 1 988;62:27-44, 



[2168] 865. (PentapeptkJe_2) 
Pentapeptide repeats (8 copies) 

[21 69] These repeats are found in many mycobacterial proteins. These repeats are most common In the PPE family 
of proteins, where they are found in the MPTR subfamily of PPE proteins. The functkxi of these repeats is unknown 
The repeat can be approximately described as XNXGX, where X can be any amino acid. These repeats are similar to 
Pentapeptide [1], however it is not dear If these two families are stmcturally related. 



Number of members: 362 



[2170] 



[1] Batenr^an A. Murzin A. Teichmann SA; Medline: 98318059 Stmcture and distrlbutfon of pentapeptkJe repeats 
in bacteria" Protein Sci 1 998;7: 1477-1 480. 

[2] Cole ST, Brosch R. Paridiill J. Gamier T Churcher C, Harris D, Gordon SV. Eiglmeier K, Gas S. Barry CE 3rd, 
Tekaia F, Badcock K. Basham D. Brown D. Chillingworth T Connor R. Davies R. Devlin K, Feltwell T. Gentles s! 
Hamlin N. Holroyd S. Homsby T. Jagels K. Barrell BG; Medline: 98295987 Deciphering the bk>togy of Mycobac- 
terium tuberculosis from the complete genome sequence." Nature 1998;393:537-544. 

[2171] 866. (Peptidase_C13) 
Peptidase CI 3 family 

This family of peptkJases is known as the hemogkjbinase family because it contains a gtobin degrading enzyme from 
blood parasites Swiss:P42665. However relatives are found in plants and other organisms that have other functkxis. 
Members of this family are asparaginyt peptidases [1]. 

Number of members: 26 



[2172] [1] Chen JM. Dando PM, Rawlings NO. Brown MA. Young NE. Stevens RA. Hewitt E. Watts C. Barrett AJ; 
Medline: 97218252 Ckxiing. isolation, and characterizatbn of mammafian iegumain, an asparaginyl endopeptidase " 
J Bk>l Chem 1 997;272;8090-8098. ^ , ^ ^ 

[2173] 867. (Pro^dh) 
Proline dehydrogenase 



Number of members: 25 



[2174] [1 J Ling M. Allen SW. Wood JM; Medline: 95055736 Sequence analysis kJentifies the proline dehydrog 
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The complete genome sequence of ihe hyperthermophilic, su^hate- reducing archaeon Archaeoglobus fulgidus. 
• Nature 1 997;390:364-370. 

[2195] 876. (Transglut^core) 

[21 96] Cro5S-reference(s) PS00547; TRANSGLUTAMINASES 

Transglutaminases (EC 2.3.2. 1 3) (TGase) [1 ,2) are calcium-dependent enzymes that catalyze the cross-linking of pro- 
teins by promoting the fomiation of isopeptide bonds between the gamma-cartxjxyl group of a glutamine in one polypep- 
tide chain and the epsilon-amino group of a lysine in a second polypeptide chain. TGases also catalyze the conjugation 
of polyamines to proteins. The best known transglutaminase is bkxxJ coagulatkxi factor XIH. a plasma tetramerk: protein 
composed of two catalytic A subunits and two non-catalytk: B subunits. Factor XIH is responsfole for cross-linking fibrin 
chains, thus stabilizing the fibrin c\oi. Other f onms of transglutaminases are widely distributed in various organs, tissues 
and body flukis. Sequence data is available for the folk>wing forms of TGase: 

Transglutaminase K (Tgase K). a membrane-bound enzyme found in mammalian epkJermis and important for the 
fornr)atk)n of the comified cell envelope (gene TGM1). 

- Tissue transglutaminase (TGase C). a monomeric ubiquitous enzyme kxated in the cytoplasm (gene TGM2). 

- Transglutaminase 3. responsble for the later stages of cell envelope fonmatk)n In the epidemits and the hair folficle 
(gene TGM3). 

Transglutaminase 4 (gene TGM4). 

[2197] A conserved cysteine is known to be involved in the catalytic mechanism of TGases. The erythrocyte mem- 
brane band 4.2 protein, which probably plays an important role in regulating the shape of eifythrocytes and their me- 
chanfcal properties, is evdutionaiy related to TGases. However the active site cysteine is substituted by an alanine 
and the 4.2 protein does not show TGase activity. 

[2198] Consensus pattem:[GT]-Q-{CA]-W-V-x-[SAl-IGA]-{IVT]-x{2)-T-x-[LMSC]-R-[CSA]-[LV]-G [The first C is the 
active site residue] Sequences known to bekxig to this class detected by the pattemALL Other sequence(s) detected 
in SWISS-PROTNONE. 

[2199] ( 1] Ichinose A., Bottenus RE., Davie E.W. J. Bfol. Chem. 265:13411-13414(1990). [2] Greenberg C.S., Birck- 
bichler P. J.. Rice RH. FASEB J. 5:3071-3077(1991). 

[2200] 877. (TruB_N)TruB family pseudouridylate synthase (N terminal domain) 

(Members of this family are involved in modifying bases in RNA molecules. They carry out the conversk>n of uracil 
bases to pseudouridine. This family includes TruB. a pseudouridylate synthase that specifically converts uracil 55 to 
pseudouridine in most tRNAs. This family also Includes Cbf5p that nxxiifies rRNA [2]. 

Number of members: 33 

[2201] 

(1 ] Nurse K. Wrzesinski J, Bakin A. Lane BG. aengand J; Medline: 96079944 Purificatwn. ckxiirig, and properties 
of the IRNA psi 55 synthase from Escherichia coli." RNA 1995;1:102-112. 

[2] Lafontaine DU. Bousquet-Antonelli C. Henry Y, Caizergues-Ferrer M. Tollen/ey D; fy^edline: 981 39521 The box 
H + ACA snoRNAs carry CbfSp, the putative rRNA pseudouridine synthase." Genes Dev 1998;12:527-537. 

[2202] 878. (UDPGP)UTP-glucose-l -phosphate uridylyllransferase 

This family consists of UTP-glucose-1 -phosphate uridylyltransf erases, EC:2.7.7.9. Also known as UDP-gtucose py- 
rophosphofylase (UDPGP) and Glucose-1 -phosphate uridylyltransferase. UTP-glucose-1 -phosphate uridylyltrans- 
ferase catalyses the interconverskxi of MgUTP + gIucose-1 -phosphate and UDP-g!ucose + MgPPi [1]. UDP-glucose 
is an important intermediate in mammalian carbohydrate interconversion involved in varbus metatx)lic roles depending 
on tissue type [l]. In Dictyostelium (slime mold) mutants in this enzyme abort the development cycle (2]. Also within 
the family is UDP-N-acetylglucosamine Swiss:Q16222 or AGX1 {3] and two hypothetical proteins from Borrelia burg- 
dorferi the lyme disease spirochaete Swiss.'051893 and Swiss:051036. 

Number of members: 16 

[2203] 

(1] Duggleby RG, Chao YC. Huang JG. Peng HL. Chang HY; Medline: 96202932 Sequence differences between 
human muscle and liver cDNAs for UDPglucose pyrophosphorylase and kinetic properties of the recombinant 
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Number of members: 15 

[2186] 87a (Ribosomal_S27) 
Rftx>somai protein S27a 

5 [2187] This family of ribosomal proteins consists mainly of the 40S ribosomal protein S27a which is synthesized as 
a C-terminal extension of ubiquilin (CEP). Ihe S27a domain compromises the C-tenminal half of the protein. The 
synthesis of ribosomal proteins as extensions of ubiqurtin promotes their incorporation into nascent riboson^s by a 
transient metabolic stabilization and is required for efficient ribosome biogenesis 13]. The ribosomal extension protein 
S27a contains a basic region that is proposed to form a zinc finger, its fusion gene is proposed as a mechanism to 

10 mafritain a fixed ratio between ubiquitin necessary for degrading proteins and ribosomes a source of proteins [2]. 

Number of members: 36 

[2188] 873. (Spermine.synth) 
IS Spenmine/spermtdine synthase 

[21 89] Sperm'ffie and spermidine are polyamines. This family includes spermidine synthase that catalyses the fifth 
(last) step in the biosynthesis of spermidine from arginine, and spermine synthase. 

Number of members: 39 

20 

[2190] 

(1] Mezquita J, Pau M, Mezquita C; Medline: 97449308 Characterization and expression of two chicken cDNAs 
encoding ubiquitin fused to ribosomal proteins of 52 and 80 amino acids." Gene 1997;195:313-319. 
25 [2] Redman KL, Rechsteiner f^; f^edline: 89181932 Identification of the long ubiquitin extension as ribosomal 

protei- S27a' Nature 1 989:338:438-440. 

[3] Rnley D, Bartel B. Nfershavsky A; Medline: 89181925 The tails of ubiqurtin precursors are ribosomal proteins 
whose fusion to ubiquitin facilitates r&>osonfie biogenesis.' Nature 1 989;338:394-401 . 

30 [2191] 874. (Surp)Surp module 

[1] Denhez F, Lafyatis R; Medline: 94266805 Conservation of regulated alternative splicing and identification of func- 
tional domains in vertebrate homologs to the DnDSophila splicing regulator, suppressor-of-white-apricof J Biol Chem 
1994;269:16170-16179. 

[2192] This domain is also known as the SWAP domain, SWAP stands for Suppressor-of-White-APricoL It has been 
^ suggested that these domakis may be RNA binding [1 ]. 

Number of members: 32 

[2193] 875. (TRiE)TFllE alpha subunit 
40 The general transcription factor TFIIE has an essential role in eukaiyotic transcription initlatfon together with RNA 
polymerase II and other general factors. Human TFIIE consists of two subunits TRIE-alpha Swiss:P29083 and TFIIE- 
beta Swiss:F>29084 and joins the preinitiation complex after RNA polymerase II and TFIIF [IJ. This family consists of 
the conserved amino terminal region of eukaryotic TFIIE-alpha [2] and proteins from archaebacteria that are presumed 
to be TFIlE-alpha subunits also Swiss:029501 [3]. 

45 

Number of memt)ers: 12 
[2194] 

SO [1] Ohkuma Y, Sumimoto H. Hoffmann A, Shimasaki S, Horikoshi M. Roeder RG; Medline: 92065982 Structural 

motifs and potential sigma homologies in the large subunit of human general transcriptkxi factor TFIIE.* Nature 
1991;354:398-401. 

[2] Ohkuma Y, Hashimoto S, Ftoeder F^G. Horikoshi M; Medline: 93087200 ldentificatk)n of two large subdomains 
in TFIlE-alpha on the basis of homology between XerK>pus and human sequences. Nuclek: Acids Res 1992;20: 
55 5838-5838. 

(3J Klenk HP. Clayton RA. Tomb JF. White O, Nelson KE. Ketchum KA. Dodson RJ. Gwinn M. Hickey EK. Peterson 
JD. Rk:hardson DU Kertavage AR, Graham DE, Kyrpktes NC. Fleischmann RD, Quackenbush J. Lee NH, Sutton 
GG, Gill S. Kirkness EF. Dougherty BA. McKenney K. Adams MD. Loftus B. Venter JC, et al; Medline: 98049343 



EP 1 033 405 A2 



A. Asparaginase 2 

[2212] Asparaginase tl (L-asparagine aminohydrolase 11) is an extracellular protein that may be associated with the 
celt wall and whose expression is affected by the availability of nitrogen. Asparaginase II catalyzes the reaction of L- 
Asparagine -i- = L*Aspartate -1- NH3. As many ieukemias have high requirements for aspartic acid, asparaginase 
II proteins are useful as reagents for screening compounds for activity as leukemia chemotherapy products. Aspara- 
ginase II protein can also be over- or under-expressed to alter amino acid content in plant tissues or to modify nitrogen 
fixation and/or nitrogen metatjolism in plants. 

[221 3] Ref: Bon et aL ( 1 997) AppI Biochem Bk>technol 63-65: 203-1 2 

B. Chloroa b-bind 

[221 4] Chbrophyll a-b binding proteins are located in the thylakoid membranes of the chloroplast and bind chk>rophylI 
a and chlorophyll b, thereby triggering a chemk:al reactkxi (photosynthesis). These proteins are useful in controlting 
the rate, efficiertcy and^oroutput of photosynthesis. Overexpression of chlorophyll a-b binding proteins is expected to 
increase the rate of photosynthesis. 

Ref: Leutwiler et al. (1 986) Nucleic Acids Res 1 4: 4051 -64 Brandt et al, (1 992) Plant f^l Biol 1 9: 699-703 

C. DMRL synthase 

[2215] DMRL Synthase (6,7-Dimethyl^-Ribitylluma2ine Synthase) catalyzes the last step in riboflavin (Vitamin Bg) 
synthesis, condensing 5-amino-6-{V-D)-ribityl-amino-2,4(1H, 3H)-Pyrimidinedione w'rth L-3.4-Dihydroxy-2-Butanone 
4-Phosphate producing 6,7-Dimethyl-8-(1-D-Ribityl)Lumina2ine. The enzyme forms a homopentamer. Engineering of 
these proteins or those with homologous sequences/structures may altow control of the amounts of vitamin B2 available 
in plants and/or accumulation of pigment, as well as altering reactkxis requiring hydrogen ion carriers/transmitters. 
Ref: Garcia-Ramirez et al. (1995) J Biol Chem 270: 23801-7 

D. E1_N 

[221 6] These proteins are ATP-dependent DNA helicases that are required for initiation of viral DNA replicatfon. They 
form a complex with the viral E2 protein. The E1-E2 complex binds to the replication origin that contains binding sites 
for both proteins. The majority of sequences known for this group of proteins are from various papitbmaviruses, a type 
of double stranded DNA virus. In plants, the prototype double stranded DNA virus is Cauliflower Mosaic vinjs (CaMV). 
Manipulatkm of these proteins, especially to produce variant proteins that form non-productive complexes, enables 
production of plants that are resistant to infection by double stranded DNA viruses. 

Ref: Yang et al. (1 993) PNAS USA 90: 5086-90 

Ustav and Stenlund (1991) EMBO J ia 449-57 
Callaway et al. (1996) Mol Plant Microbe Interact 9: 810-8 

E. EF1 G 

[2217] Elongatkxi Factor-1 is composed of four subunits: alpha, beta delta and gamma. Gamma subunits are pre- 
sumed to play a role in anchoring the complex to other cellular components. Studies of EF-1 genes in plants suggests 
that different forms of the EF-1 subunits may be expressed in particular organs or in response to stress. Manipulation 
of the activity of these proteins, either by altered expression level or by structural mutation, may result in the accumu- 
latk)n of a particular protein in a chosen organ or altow productton of partk:ular proteins during stress conditions. 

Ref: Kinzy et al, (1994) NAR 22: 2703-7 Dunn et al. (1993) Plant Mol Biol 23: 221-5 Aguilar et al. (1991) Plant Mol 
Btol 17: 351-60 

F. ENV polyprotein 

[2218] This family comprises the envelope or coat proteins known from a number of different retroviruses. In mam- 
malian species, retroviruses are responsible for diseases such as leukemia and HIV. In plants, retroviruses are known 
in tx)th monocot (e.g. Zeon-1 ) and dicot (e.g. Arabidopsis and tobacco) species and have been shown to induce mutant 
alleles at new loci. Engineering of plant ENV proteins may allow nrx>bilizatton or targeting of endogenous or introduced 
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enzymes expressed in Escherichia coli." Eur J Biochem 1996;235:173-179, 

[2] Ragheb JA. Dottin RP; Medline: 87231075 Structure and sequence of a UDP glucose pyrophosphoiylase gene 
of Dictyostelium discoideum.' Nucleic Acids Res 1987;15:3891-3906. 

13] Mk) T. Yabe T. Arisawa M. Yamada<)kabe H; Mecfline:- 98269105 The eukaryotic UDP-N-acetylglucosamine 
pyrophosphorylases. Gene cloning, protein expression, and catalytic mechanism. J Bbl Chem 1998273: 
1 4392" 1 4^©7. 



[2204] 879. (UPF004) Uncharacterized protein family UPF0044 slgnatureCross-reference(s) PS0 1 301 UPF0044 
TJe following uncfiaracterized proteins have been sham {1 J to be highlysBniJar. - BaciUus subtilis hypothetical protein 

- Escherichia coii hypothetical protein yhbY and HI1 333. the corresponding Haemophilus influenzae protein 

- Methanococcus jannaschn hypothetical protein MJ0652. 

These are small proteins of 10 to 1 5 Kd. They can be picked up in the database by the foliowing pattern. This pattern 
IS located in the N-terminal part of these proteins. 

[22^ Consensus pattern: L-ISTl-x(3)-K-x(3HKRHSGAhx4GAl-H-x-L-x4'-nJVl-x(2)- (UV]-{GAJ-x(2)-G Sequenc- 
es known to belong to this class detected by the pattemAIX. Other sequence(s) detected in SWISS-PROTNONE 
[22061 880. (2f-A20)A2Wike zinc fingerA20- (an inhibitor of cell death)-llke zinc fingers. The zincfinger mediates self- 
associatnn in A20. These Tingers alsomediate IL-1 -induced NF-kappa B activation. 



Number of memt)ers: 22 
[2207] 



11) Heyninck K. Beyaert R; MedRne: 99126071 The cytoklnennducible zhc finger protein A20 inhibits IL-1 -induced 
NF- kappaB activatnn at the level of TRAF6. FEBS Lett 1999;442:147-1 SO. 

[2] De VkkJk D. Heyninck K. Wan Criekmge W. Contreras aBeyaert R. Rers Wi Mecfline: 96390831 A20 an inhibitor 
of cell death, self-associates by its zinc finger domain.* FEBS Lett 1996;384:61-64 

ISJSong HY, Rothe M. Goeddel DV; MedOne: 96270609 The tumor necrosis factor-induciblo zinc finger protein 
A20 interacts with TRAF1/rRAF2 and inhtoits NF-kappaB activatkxi. Proc Natl Acad Sci U S A 1 996-93 6721 -6725 
[41 Opipan AW Jr. Boguski MS. Dixit VM; Medline: 90368626 The A20 cDNA induced by tumor necrosis lactw 
alpha encodes a novel type of zinc finger protein." J Bui Chem 1990;265:14705-14708. 

[2208] 881. (zf-PARP) 

Poly(ADP-ribose) polymerase zinc finger domain 

Cioss-reference(s) PS00347: PARP_2N_FINGER_1 PS50064; PARP_ZN_FINGER_2 

(a09] Poly(AOP-ribose) polymerase (EC 2.4.2.30) (PARP) (1.2] is a euka.yotic enzyme that catalyzes the covalent 
attachment of ADP-nbose units from NAD(+) to various nuclear acceptor proteins. This posl-translatkxial modificatkm 
of nuclear proteins is dependent on DNA. It appears to be involved in the regulation of various important cellular proc- 
esses such ^ differentiatkn pioliferatk« and tumor tiansfomiation as weU as in the regulatnn of the molecular events 
involved n ttie recovery of the ceO from DNA damage. Structurally. PARP. about 1000 aminc^ackls residues tonq 
consBteof three distinct domains: an N-tenninal zinc-dependent DNA-binding domain, a central automodificatwn do^ 
mam and a C-tenninal NAD^inding domain. The DNA-binding regran contains a pair of zinc finger domains wheh 
have been shown to bind DNA in a zinc<tependent manner. The zinc finger domains of PARP seem to bind specifk»llv 
lo Se^2 ^^'^ " ^ '^^^""^ * "^^^ copy of a zinc finger highTsimilar 

[2210] C<w.^spattem:C^KR^x-C-x(3)-l-x-K-x(3)-{RG]-x(16.18^W-^^ 

^mq^S^kT^ '° *^ ^ "^^'^'^ ^ pattemALL Other sequence(s) detected in 

[2211] Note: This documentation entry is finked to both signature patterns and a profile. As the profile is much more 
sensitive than the patterns, you shouM use it if you have access to the necessary software tools to do so. 

( 1J Althaus F.a. FfichterC.R. Mol. Biol. Biochem. Biophys. 37:1-126(1987). 

( 2) de Murcia G.. I^nissier de Murcid J. Trends Biochem. Sci 19 172-176(1 994) 

L^'^^' ^ t>-r°^"^ ^ - ■ ^^"^ ° - Wang R-R. SheH B.K.. Nash RA.. Schar 

P.. Barnes O.E., Haseltne W.A.. Ltndahl T. Mol. CeO. Bwl. 15:3206-3216(1995). 
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to be regulated by circadian rhythms and in a light dependent manner, occurring at higher levels in roots, for example 
and lower levels in light-grown tissues such as cotyledons. Generally, HMG proteins are thought to influence transcrip- 
tion regulation. In plants, HMGs are believed to have a role in maintaining patterns of circadian-regulated expression 
for other genes, suggesting that these proteins could be exploited to control growth and development. 

Ref: Laudet et at. (1 993) Nucleic Acids Res 21 : 2493-501 Zheng et al. (1 993) Plant Moi Biol 23: 81 3-23 Grasser et 
al. (1993) Plant Mol Biol 23: 619-25 

L 11^ 

[2224] lnterleukln-2 (f L-2)is produced In mammals by T cells In response to antigenic or mitogenic stimulation and 
is crucial for proper regulation and functioning of the Immune response. I L-2 is capable of stimulating B cells, monocytes, 
lymphokine-activated killer cells, natural killer cells and glionra cells. Plant extracts have also been shown to stimulate 
the immune system (for example, mistletoe therapy for human cancer). It is known that IL-2 is involved in feedback 
inhibitk>n pathways that impact the inflammatory response as well as the growth inhibitk)n of tumor reactive T celts. 
Plant proteins containing tL-2-like sequences are useful as lmmunity-t}ased therapeutics, acting In a manner simitar 
to IL-2 in mammals. 

Ref; Heike et al. (1997) Scand J Immunol 45: 221-6 Ariel et aL (1998) J Immunol 161: 2465-72 Schink (1997) An- 
ticancer Drugs 8 Supp1 1 : S47-51 

M. Oxidored FMN 

[2225] NADPH dehydrogenases catalyze the reactron NADPH + acceptor = N ADP(+) + reduced acceptor. One mem- 
ber of this family is yeast old yellow enzyme" (OYE) and is thought to be involved in oxylipin metabolism. A second 
yeast family member is a protein that bbds estrogen binding protein (EBP) in addition to exhibiting oxidoreductase 
activity. An ArabkJopsis homoiog to OYE has been described and estrogen binding proteins in plants have been re- 
ported. Plant proteins from this class have the potential to be used to nxxJify lipki metabolism/catabolism. These pro- 
teins may also have use as therapeutk:s for breast and prostate cancer, and other abnormal growth in steroid-sensitive 
tissues. 

Ref; Baker et al. (1 998) Proc Soc Exp Biol Med 21 7: 31 7-21 Schaller and Weiler (1 997) J Biol Chem 272: 28066-72 
Mandani et al. (1994) PNAS USA 91: 922-6 

N. Oxidored_q2 

[2226] The NADH-plastoquinone oxidoreductases catalyze the reaction NADH + ptastoquinone = NAD(+) + plasto- 
quinol. In plants these reactions occur in the chk>roplast and are t>eljeved to participate in a chbroplast respiratory 
system. Here, the NOIH complex is postulated to act as a valve to remove excess reduction equivaients In the chloro- 
plasts. Manipulation of these proteins may improve the rate or efficiency of photosynthesis. 

Ref: Bun^owsetat. (1998)EMBOJ 17: 868-76 Koferetal (1998) Mot Gen Genet 258: 166-73 Maieretal. (1995) J 
Mol Biol 251: 614-28 

O. PABP 

[2227] Polyadenytate binding proteins bind the poly (A) tail of mRNA. Plants, as exemplified by Arabidopsis, contain 
numerous PABP genes that are expressed in an organ-specific manner. For example, PABP2 is functional in roots and 
shoots, while PABP5 is expressed predominantly in immature flowers. The PABP proteins are implicated in numerous 
aspects of posttranscriptional regulation including mRNA turnover and translatkDnal initiation. Control of activity of PABP 
proteins provkies the ability to control the expression of various genes in partkrular organs during devekspment. 

Ref: Hilson et al (1993) Plant Physiol 103; 525-33 Bekjstotsky and Meagher (1993) PNAS USA 90: 6686-90 
P. Parvo coat 

[2228] Pan/ovinjses are linear single-stranded DNA viruses that are encapsulated by three capsid proteins. Plants 
are susceptible to infectbn by single stranded DNA viruses such as Maize streak virus (MSV) and various Gemini 
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retroviruses, in essence generating a new method for mutant production, gene tagging and the like. 

Ref : Mamoun et al (1 990) J Virol 64: 41 80^ Grandbastien el al. {1 989) Nature 337: 376-80 Wright and Voytas (1 998) 
Genetics 149: 703-15 

G. GIvcosvl hvdr9 

[22191 Proteins having this domain (previously known as the glycosyl hydrolase family 5 domain) catalyze the en- 
dohydrolysis of 1 ,4-p-D-glucosidic linkages in cellulose. Numerous plant proteins with this domain exist and are ex- 
pressed in an organ specific manner. They are involved in the fruit ripening process, in cell ekxigation and plant re- 
production. Modulation of the activity of these proteins, either by over- or under-expression or by mutation of the 
polypeptkle. couW be used to affect post-hanrest physk^logy (e.g. rate of ripening) or for engineering reproductive 
sterility. 

Ref: Gkxda et al (1990) Biochemistry 29: 7264-9 Tucker et al. (1988) Plant Physfol 88:1257-62 Shani et aL (1997) 
43: 837-42 Mflligan and Gasser (1995) Plant Mol Bk3l 28: 691-711 

H. GIvcosvl hvdr14 

[2220] The p-amylases (family 1 4 of glycosyl hydrolases) catalyze the hydrolysis of 1 ,4-a-glucoskJic linkages in 
polysaccharkjes and remove successive maltose units from the non-reducing ends of the chains. Mutants of ^-amylase 
in ArabkJopsis exhibited altered degradation of starch throughout the diurnal cycle. In additwn, the mutant phenotypes 
indicated that these enzymes not only affect carbohydrate metabolism/catabolism, but also influence the amount of 
pigment stored within partcular cells. Manipulatton of the p-amylase genes enables control of plant pigmentatkm (for 
example, fibre pigment in cotton) as well as carbohydrate synthesis and degradation. 

Ref: Zeeman et aL (1998) Plant J 15: 357-65 Hiranoand Nakamura (1997) Plant Physiol 114: 5675-82 Kitamoto et 
al. (1 988) J Bacterbl 1 70: 5848-54 

i. GIvcosvl hvdrlS 

[2221] Glycosyl hydrolases from family 1 5 (such as 1 .4-Alpha-D-Glucan glucohydrolase.) catalyze the hydrolysis of 
terminal 1,4-linked alpha-D-glucose residues successively from the non-reducing ends of the chains resulting in the 
release of p-D-Glucose. In plants these proteins have been tied to the mobilizatwn of the xytoglucan stored in the 
cotyledonary cell walls. Proteins such as these couW be varied to affect the rate of plant growth (for example during 
germlnatkxi), storage and/or use of glucose and other sugars by plant tissues and atteratton of the properties, such 
as elastk:ity, of plant cell walls. 

Ref: Crombie et al. (1 998) Plant J 1 5: 27-38 Hata et al. (1 991 ) Agric Bk>l Chem 55: 941 -9 
J. Glycosvl_hvdr20 

[2222] MerTrt)ers of the family 20 glycosyl hydrolases catalyze the hydrolysis of terminal non-reducing N-acetly-D- 
hexosamine rescues m N-acetyl-§-D-hexosaminides. N-acetyl-P-glucosaminidase bel<Migs to this family and exists in 
several different forms (consisting of various combinatkxis of alpha and beta chains) depending on the organism. 
Family 20 glycosyl hydrolases have been implicated in lysosomal storage diseases (such as SandhofI disease) and 
glycogen storage disease in humans. These types of proteins are also responsible for the hydrolysis of chitin. In plants, 
these pr<^eins coukJ be useful In controlling carbohydrate catabolism. thereby influencing the amount of sugars avail- 
able for storage and/or use in other metabolic pathways. In addition, it is possible that such proteins couW be used to 
engineer an endogenous insect protectkxi mechanism, e.g. by secretkx) of a chitin-hydrolyzing composition by the 
plant. 

Ref: Graham et al (1988) J Biol Chem 263: 16823-9 CTDowd et al. (1988) Bkx:hemistry 27: 5216-26 

K. HMG box 

[2223] The HMG box is a novel type of DNA-binding domain found in a diverse group of proteins. Numerous plant 
proteins contain this domain, such as the HMG1/2-like proteins. The expression of some of these HMG proteins appears 
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U. Signal 

[2233] Many plant proteins in this family contain sequences similar to those found in both components of the prokary- 
otic family of signal transducers known as the two^omponent systems. This suggests that activatbn may require a 
transfer of a phosphate group between the transmitter domain and the receiver domain. One family memt>er in Arabi- 
dopsis appears to be involved in ethylene (a plant hormone) signal tfansduction. Other proteins in this family appear 
to be involved in the regulation of gene transcription under conditions of environmental stress. Signal proteins can be 
exploited to affect plant growth and devek)pment and/or control plant responses to stress conditions such as cold, 
nutrient availability, etc. 

Ref: Chang et al. (1993) Science 262: 539-44 Nagaya et al. (1993) Gene 131: 119-124 Gottfert et al. (1990) PNAS 
USA 87: 2680-4 

V. vMSA 



[2234] vMSA proteins are major surface antigens presenting on the envek>pe of vanous retroviruses. Surface anti- 
gens of retroviruses are often involved in tropism of the virus. Plants contain retrovirus-like viruses such as pararetro- 
viruses and retrotransposons (i.e. transposons having long terminal repeats). Plant retrotransposons in particular have 
been used to create mutants at various loci, thereby permitting gene isolation, gene tagging and the like. Manipulation 
20 of plant vMSA proteins enables control tropism of plant retroviruses that might be used for genetic engineering tools, 
thus enabling targeting of the virus to partrcular species and/or tissues of plants. 

Ref: Okamoto et al. (1 988) J Gen Virol 69: 2575-83 Grandbastien el al. (1 989) Nature 337: 376-80 Wright and Voytas 
(1998) Genetics 149: 703-15 



W. zf-CCCH 



[2235] This family of proteins is defined by having two CX(8)CX(5)CX(3)H-type zinc finger domains. These pioteins 
cover a broad range of functions. For example, the COP1 protein acts as a repressor of photomorphogenesis in dark- 

30 ness; light stimuli abolish this suppressive action. In addition, COP1 protein can function as a negative transcriptranal 
regulator capable of direct interacticwi with components of the G-protein signaling pathway. As a second example, a 
zf-CCCH protein identified in Arabidopsis appears to be involved in the resistance to DNA damage induced by UV light 
and chemk:al ONA-damaging agents. Overexpression of this class of protehs permits productk>n of plants that are 
better suited to adverse environments. Manipulatkxi of expression of zf-CCCH proteins functk>ning as transcriptkxial 

3S regulators, such as COP1. enables manipulatk)n of some signal transduction pathways. 

Ref: Pang et aL (1993) Nucleb Ackls Res 21: 1647-53 Deng et al. (1992) Cell 71: 791-801 
X. zf-RanBP 

40 

[2236] Proteins falling within this category contain many X-X-F-G and X-F-X-F-G repeats, and may contain 
RANBP1 -like or PPIase domains. Plant proteins having domains similar to these include PAS1 and GMSTl. PAS1 has 
been shown to have dramatic developmental affects that appear to be correlated with both cell divisk)n and cell wall 
elongatkxi. GMSTl has high identity to the yeast STl stFess-inductt>le gene and has been shown to be heat irKlucible. 
^ Proteins such as these may be useful for controlling growth and form of devek>pment. 

Ref: Vittorioso et al. (1 998) Mol Cell Bk)l 1 8: 3034-43 Hernandez Torres et al. (1 995) 27: 1 221 -6 

Y. Peptkiase M48. 

50 

[2237] Proteins belonging to this peptidase family are metalloproteases that bind zinc as a cofactor and are kxated 
in the membranes of the endoplasmic reticulum. They function in NHg-lerminal proteolytic processing, as shown for 
the yeast STE24 gene product. This gene is required for the correct processing of a-factor, a yeast pheromone. Family 
M48 peptidases also appear to be required for some prenylation reactions, mediating COOH-temdinal CAAX process- 
ing. Prenylatksn reactions are believed to be involved in the regulatk>n of protein-protein and protein-membiane inter- 
actk>ns. As an example, RAS GTPase activity is regulated in part by kx^alization to the inner side of the plasma mem- 
brane upon prenylatkm. In plants, proteins from this family couki be involved in pollen-stigma interactkxis such as 
those mediating self-pollenation vs. outcrossing, or could be members of several secondary metabolism pathways. 
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viruses. The coat proteins in these plant viruses are critical to the virus life cycle within the plant. For example, the coat 
protein of MSV is thought to be involved m intra- and inter-cellular movement within the plant. Engineering of proteins 
having similarity to parvoviral coat proteins, especially to produce proteins that interfere with maturation of the virus 
particle, enables the production of plants having better resistance to natural plant single-stranded ONA viruses. 

Ref. Uu et al. (1 997) J Gen Virol 78: 1 265-70 Rohde et aL (1 990) Virology 1 76: 648-51 

Q. Pkinase C 

[2229] Plant serine/threonine protein kinases possessing this domain are expressed in all tissues and are known to 
undergo serine-specific autophosphorytatkxi and specffk:ally phosphorylate two ribosomal proteins. PI 4 and P16. Dur- 
ing devek)pment. these proteins predominate during high metabolk: activity In growing buds, root tips, leaf margins 
and germinating seeds. They are thought to be hvolved in the control of plant growth and development In addition, 
two genes encoding proteins from this family have been described that help plant cells adapt during coW or high salt 
stresses. Consequently, engmeering Pkinase C proteins provides a way to control general growtfVdevelopment of the 
plant as well as a means to provkfe erKk>genous protection against environmental stresses. 

Ref: Zhang et aL (1994) J Bbl Chem 269: 17586-92 l^oguchi et aL (1995) FEBS Lett 358: 1 99-204 
R. REV 

[2230] The REV proteins act post-transcriptnnally to relieve negative repression of GAG and ENV production in 
retrovinises such as Human Immounodeficiency Virus type 1 (HlV-1). Plants contain retrovinjs-Rke viruses such as 
pararetfoviruses and retrotransposons (Le. transposons having kxig terminal repeats). Plant retrotransposons in par- 
ticular have been used to create mutatkxis at various loci, thereby permitting gene isolation, gene tagging and the like. 
f^anipulatk>n of plant REV proteins enables control of transposition frequencies of corresponding transposable ele- 
ments and provkies a new tool for genetk; engineering of plants. 

Ref: Sodroski et aL (1986) Nature 321: 412-7 Franchini et aL (1989) PNAS USA 86: 2433-7 Marquet et aL (1995) 
77: 113-24 Grandbastlen et al. (1989) Nature 337: 376-80 WHght and Voytas (1998) Genetics 149: 703-15 

S. RuBisCo small 

[2231] Ributose 1 ,5-bisphosphate carboxylas^oxygenase (RuBisGo) catalyzes the initial step in the C3 photosyn- 
thetic carbon reduction cycle, adding carbon dk>xkie to 0-ributose 1.5-bisphosphate to form two molecules of 3-phos- 
pho-D-glycerate. RuBisCo is comprised of two subunlts. one large whfeh is synthesized in the chtoroplast. and one 
small whfch is synthesized in the cytoplasm and then transported in to the chtoroplast The expresston of the small 
subunit of RuBisCo is light regulated. Manipulation of these proteins could increase the efficiency of photosynthesis 
or allow alterations in developmental timing. 

Ref: Giuliano et aL (1988) PNAS USA 85: 7089-93 Oedonder et aL (1993) Plant Physfol 101: 801-8 
T. Sialvltransf 

[2232] Members of the CMP-N-acetylneuraminate-p-galactosamkje-a-2,3-siatyltransferase family catalyze the fol- 
lowing reactkxi: 

CMP-N-acetylneuraminate + p-D-galactosyH.S-N-acetyl-a-O-galactosaminyl-R = CMP ■¥ a-N-acetylneraminyl-2.3-p- 
D-galactosyl-1,3-N-acetyl-alpha-0-galactosamlnyl-R. These proteins are though to be responsible for the synthesis of 
the sequence neurac-a-2.3-gal-p-l .3-galnac- found on sugar chains )-linked to threonine or serine and also as a ter- 
minal sequence on certain gangliosides in mammalian cells. In plants, glycosyltransferases in the Golgi apparatus 
synthesize cell wall polysaccharides and elaborate the complex glycans of glycoproteins. Engineering of plant sialyl- 
transferases allows targeting of proteins to particular cellular kx^tions or enables the making of changes in cell wall 
structure. 

Ref: Wee et aL (1998) Plant Cell 10: 1759-68 Lee et aL (1994) J Biol Chem 269: 10028-33 Kitagawa and Paulson 
(1994) J Biol Chem 269: 1394-401 
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plants, these proteins are located in the thylakoid membranes of the chloroplasts. their expression is light regulated 
and they are thought to be involved in degradation of soluble stromal proteins and turnover of thylkoid proteins [1]. 
Manipulation of expression and structure of these proteins would have effects on the efficiency of photosynthesis and 
the development of chloroplasts. 
[2246] Refs 

1 Lindahi M, Tabak s. Cseke L. Pichersky E. Andersson B. Adam Z (1996) J Bbl Chem 271 : 29329-34. 
Ae. UPf=0051 

[2247] There is some evidence that, in plants, proteins in this family are involved in ATP synthesis in chloroplasts 
[1 , 2]. (Mutations in these proteins or altering their expression would affect the efficiency of photosynthesis and energy 
productkx). 
[2248] Refs 

1 Kostrzewa M. Zetsche K (1992) J Mol Biol 227: 961-70. 

2 Kostrzewa M. Zetsche K (1993) Plant Mol Biol 23: 67-76 

Af. E7 

[2249] Papillomaviruses are encapsulated double stranded DNA vimses. The Papillomavirus early protein 7 (E7) is 
known as a potent immortalizing and transforming agent. Transformatbn by E7 is thought to be mediated by the physical 
associatk)n of E7 with cellular proteins regulating entry into the cell cycle [1]. The result is entry into the cell cycle and 
suppression of terminal differentiatkxi inmammalian cells. Thus, engineering of proteins having similarity to papillo- 
mavirus E7 protein enables the production of plants having altered cellular proliferation characteristics and possibly 
altered nrx>rphok;)gy For example, overexpression of E7-like proteins wouki be expected to result in proliferation of 
cells of the tissue in which the E7 protein is expressed, perhaps with suppression of differentiatbn events. Thus, for 
example, overexpression of E7-like proteins in meristem cells can result in taller plants and suppressksn of leafing and/ 
or flowering. 
[2250] Refs 

1 Zwerschke W, Jansen-Durr P Adv Cancer Res 2000;78:1-29 

Ag. Peptidase U7 

[2251] This protein is known to be an integral membrane protein in the cyanobacterium Synechocystis where it 
f unctkxis to digest cleaved signal peptkies [1 1 This activity is necessary to maintain proper secretion of mature proteins 
across the membrane. In higher plants this protein may be present in the plastid or chloroplast membranes where it 
woukj function by enabling protein movement into and out of the chloroplasts. Mutatbns in this protein would be ex- 
pected to affect the development of plastids, including chk^roplasts. or alter the energy transfer system within the 
chloroplasts, thereby affecting growth and devebpment 
[2252] Refs 

1 Kaneko X Sato S, Kotani H, Tanaka A. Asamizu E, Nakamura Y. Miyajmra N, Htrosawa M. Sugiura M, Sasamoto S, 
Kimura T. Hosouchi T. Matsuno A, Muraki A, Nakazaki N. Naruo K, Okumura S. Shimpo S, Takeuchi C. Wada T. 
Watanabe A. Yamada M. Yasuda M, Tabata S (1996) DNA Res 3:109-36. 

A. Activities of Polvpeptides Comprisina Signal Peptides 

[2253] PolypeptMes comprising signal peptWes are a family of proteins that are typically targeted to (1 ) a partfcular 
organelle or intracellular compartment. (2) interact with a partfeular nrK>lecule or (3) for secretion outside of a host cell. 
Example of polypeptides comprising signal peptides include, without limitatwn. secreted proteins, soluble proteins, 
receptors, proteins retained in the ER, etc. 

[2254] These proteins comprising signal peptkjes are useful to modulate ligand-receptor interactions, cell-to-cell 
communicatbn. signal transductksn. intracellular communk:atioa and activities and/or chemical cascades that take 
part in an organism outskle or within of any particular cell. 

[2255] One class of such proteins are soluble proteins which are transported out of the cell. These proteins can act 
as ligands that bind to receptor to trigger signal transduction or to permit communication between cells. 
[2256] Another class is receptor proteins which also comprise a retention donr^in that lodges the receptor protein in 
the membrane when the cell transports the receptor to the surface of the cell. Like the soluble ligands. receptors can 
also nrxxiulate signal transductkxi and communteation between cells. 
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Ref: Fujimura-Kamada et al. (1 997) J Cell BioL 1 36: 271 -B5. Tam et aL (1 998) J CeP Biol. 142: 635-49, 
Z DNA Pol Viral N 

[2238] The DNA pel Viral N domain is located at the N4efiTur«l region of DNA polymerase isolated fiom several 
retroid viruses such as the Caufiflower Mosaic Virus. The domain motif has also been found in numerous other species 
from humans to cyanobacteria. In these organisms, this motif seems to be associated with two types of sequences; 
retrotransposons and mitochondrial genes. In the mitochondrial sequences this domain is potentially involved in the 
self-splicing conducted by group II introns. Various manipulations of this gene in plants allows control of the numerous 
retrotransposons endogenous to plant genomes or allows engineering of mitochondrial function, especially to increase 
efficiency of energy utilization by cells. 

REF: Chapdebine and Bonen (1 991 ) CeO 65: 465-72 Ferat and Miche (1 993) Nature 364: 358-61 Wilson et al. (1 994) 
368: 32-8 Cambareri et al (1994) 242: 658-65 Gaardner et aL (1981) NAR 9: 2871-2888 Cummings et al 
(1990) Curr Genet 17: 375-402 Hattofi et al (1986) Nature 321: 625-8 

Aa.Calpain inhib 

[2239] This domain is found in calpastatin. an inhibitor protein specific for calpain. Calpain is a non-lysosomal calcium- 
dependent intracellular protease that appears to be involved in the dynamic changes of the cytoskeleton. especially 
actin-related structures, during eariy Drosophila embryogenesis [IJ. Calpastatins co-exist in ceUs with calpains and the 
subcellular distribution of calpastatin is thought to be important to calpain regulation [21 In plants calpafris and caft>- 
astatins could be involved in embryogenesis and non-embryogenic organ reiteration. Mutations occurring in calpain 
inhibitor repeat domains would produce developmental abnonnalities such as abnormal leaf, root or flower develop- 
ment 

[2240] Refs 

1 EnDori Y and Saigo K (1994) J Bio! Chem 269: 25137-42. 

2 Mellgren RL. Lano RD. Mericle MT (1 989) Biochim Biophys Acta 999: 71-77. 

Ab. chofismate bind 

[2241] Chorismate binding domains are present in plant anthranilate synthase (AS) genes, AS genes catalyze Vhe 
first step in the biosynthesis of tryptophan by converting chorismate and L-glutamine to anthranilate, pymvate and L- 
glutamate. Some of these genes are involved in feedback inhibition by tryptophan (1] while some are feedback insen- 
sitive [2]. In Arabidopsis, two AS genes have overlapping, but different distributions. One of these AS genes is induced 
by wounding and bacterial pathogen infiltration [1 ]. Mutations in the chorismate binding domain woukJ affect the pro- 
duction of tryptophan and could nfluence the plant's defense system. AS gene products can be used for in v^ro syn- 
thesis of tryptophan and tryptophan derivatives. 
[2242] Refs 

1 Niyogi KK. Fink OR (1992) Plant Cen 4: 721-33. 

2 Song HS, Brotherton JE. Gonzales RA. Wilholm JM (1 998) Plant Physiol 117:533-43. 
Ac. late protein L2 

[2243] Papiltomavinises are encapsulated double stranded DNA vinjses. Plants are susceptible to inf ectkxi by dou- 
ble stranded DNA vimses such as Cauliftower Mosak: virus (CaM V). The coat proteins in these plant vinjses are critical 
to the virus fife cycle within the plant. For example, the coat protein of CaMV is thought to be involved in intra- and 
inter-cellular movement within the plant (IJ. Engineering of proteins having similarity to papiltomavirus coat proteins 
may enable the productkxi of plants having better resistance to natural plant double stranded DNA vinjses 
[2244] Refs 

1 Thompson SR. Melcher U (1993) J Gen Virol 74: 1141^. 

Ad. Peptidase M41 

[2245] Proteins belonging to this peptidase family are metalloproteases that bind zinc as a cofactor and are integral 
membrane proteins. They seem to be involved in the degradation of carboxy-temtinaMagged cytoplasmic proteins. In 
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strand of RNA. For plant cells, antisense RNA inhibits gene expression by preventing the accumulation of mRNA which 
encodes the enzyme of interest, see, e.g., Sheehy et al., Proc, Nat Acad. Sci. USA. 65:8605 (1988), and Hiatt et al., 
U.S. Patent No. 4.801.340. 

III.A.2. Ribozymes 

[2270] Similarly, ribozyme constructs can be transformed into a plant to cleave mRNA and down-regulate translation. 
IILA.3. Co-Suppression 

[2271 ] Another method of suppressbn is by introducing an exogenous copy of the gene to be suppressed. Introduc- 
tion of expression cassettes in which a nucleic acid is configured in the sense orientation with respect to the pronrK>ter 
has been shown to prevent the accumulation of mRNA. A detailed description of this method is described above. 

III.A.4, Insertion of Sequences into the Gene to be Modulated 

[2272] Yet another means of suppressing gene expression is to insert a polynucleotide into the gene of interest to 
disrupt transcription or translation of the gene. 

[2273] Homologous recombination could be used to target a polynucleotide insert to a gene using the Cre-Lox system 
(A.C. Vergunst et al,. Nucleic Acids f?es. 26:2729 (1998). A.C. Vergunst et al,. Plant MoL B/o/. 38:393 (1998). K Albert 
et al.. Plant J, 7:649 (1 995)). 

[2274] In addition, random insertion of polynucleotides into a host cell genome can also be used to disrupt the gene 
of interest Azpiroz-Leehan et al., Trends in Genetics ^3: 1 52 (1 997). In this method, screening for clones from a fibraiy 
containing random insertions is preferred for identifying those that have polynucleotides Inserted into the gene of in- 
terest Such screening can be performed using probes and/or primers described atxsve based on sequences from REF 
AND SEQ TABLES 1 AND 2. fragments thereof, and substantially similar sequerrce thereto. The screening can also 
be performed by selecting clones or any transgenic plants having a desired phenotype. 

ill.A.5. Regulatory Sequerx:eModulation 

[2275] The SOFs described in REF and SEQ TABLES 1 and 2. and fragments thereof are examples of nucleotides 
of the invention that contain regulatory sequences that can be used to suppress or inactivate transcription andADr 
translation from a gene of interest as discussed in I.C.5. 

III.A.6. Genes Comprising Dominant-Negative Mutations 

[2276] When suppression of production of the endogenous, native protein is desired it is often helpful to express a 
gene comprising a dominant negative mutation. Production of protein variants produced from genes comprising dom- 
inant negative mutations is a useful tool for research Genes comprising domir^ant negative mutations can produce a 
variant polypeptide which is capable of competing with the native polypeptide, but which does not produce the native 
result Consequently, over expression of genes comprising these mutations can titrate out an undestred activity of the 
native protein. For exarnple. The product from a gene cooiprising a dominant negative mutation of a receptor can be 
used to constitutively activate or suppress a signal transduction cascade, allowing examination of the phenotype and 
thus the trait(s) controlled by that receptor and pathway Alternatively, the protein arising from the gene comprising a 
dominant-negative mutation can be an inactive enzyme still capable of binding to the same substrate as the native 
protein and therefore competes with such native protein. 

[2277] Products from genes comprising dominant-negative mutations can also act upon the native protein itself to 
prevent activity. For example, the native protein may be active only as a homo^ultimer or as one subunit of a hetero- 
muttimer. Incorporation of an inactive subunit into the multimer with native subunit(s) can inhibit activity. 
[2278] Thus, gene function can be modulated in host cells of interest by insertion into these cells vector constructs 
comprising a gene comprising a dominant-negative mutation. 

IILB. Enhanced Expression 



[2279] Enhanced expression of a gene of interest in a host cell can be accomplished by either (1) insertion of an 
exogenous gene; or (2) promoter modulation. 
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[2257] In addition the signal peptide itself can serve as a ligand for some receptors. An example is the interaction of 
the ER targeting signal peptide with the signal recognition particle (SRP). Here, the SRP binds to the signal peptide, 
halting translation, and the resulting SRP complex then binds to docking proteins located on the surface of the ER. 
prompting transfer of the protein into the ER 
s [2258] A description of signal peptide residue composition is described bebw in Subsection I VC.1 . 

111. Methods of Modulating Polypeptide Production 

[2259] It is contemplated that polynucleotides of the invention can t>e incorporated nto a host cell or in^ro system 
10 to modulate polypeptide production. For instance, the SDFs prepared as described herein can bo used to prepare 
expression cassettes useful in a number of techniques for suppressing or enhancing expression. 
[2260] An example are polynucleotides comprising sequences to be transcrft>ed. such as coding sequences, of the 
present invention can be inserted into nucleic acid constructs to modulate polypeptide production. Typically, such se- 
quences to be transcribed are heterologous to at least one element of the nucleic acid construct to generate a chimeric 
IS gene or construct 

[2261] Another example of useful polynucleotides are nucleic acid nDoiecules comprising regulatory sequences of 
the present invention. Chimeric genes or constructs can be generated when the regulatory sequences of the invention 
fmked to heterologous sequences In a vector construct. Withffi the scope of invention are such chimeric gene and/br 
constructs. 

20 [2262] Also within the scope of the invention are nucleic acid molecules, whereof at least a pa rt or fragment of these 
DNA molecules are presented in REF AND SEQ TABLES 1 AND 2 of the present appllcatton. and wherein the coding 
sequence is under the control of its own promoter and/or its own regulatory elements. Such molecules are useful for 
trarisforming the genome of a host cell or an organism regenerated from said host cell for modulating polypeptide 
productkxu 

2S [2263] Additbnally. a vector capable of producing the oligonucleotide can be inserted into the host cell to deliver the 
oligonucleotide. 

[2264] More detailed descriptbn of components to be included in vector constructs are described both above and 
below. 

[2265] Whether the chimeric vectors or native nuclek: acids are utilized, such polynucleotides can be incorporated 
30 into a host cell to modulate polypeptide productk>n. Native genes andAx nucleic acid molecules can be effective when 
exogenous to the host celt. 

[2266] Methods of modulating polypeptide expression includes, without limitation: 
Suppression methods, such as 

35 Antisense 
Ribozymes 
Co-suppression 

Insertion of Sequences into the Gene to be Modulated 
Regulatory Sequence Modulation. 

40 

as well as Methods for Enhancing Productnn, such as 

InsertkxY of Exogenous Sequences; and 
Regulatory Sequence Modulation. 

45 

lll.A. Suppression 

[2267] Expression cassettes of the invention can be used to suppress expressk>n of erKJogenous genes whk:h com- 
prise the SDF sequence. Inhibiting expression can be useful, for instance, to tailor the ripening characteristics of a fruit 
so (Oeller et al.. Science 254:437 (1991)) or to influence seed size (WO98/07842) or to provoke cell ablation (Mariani et 
al.. Nature 357: 384-387 (1992). 

[2268] As described above, a number of methods can be used to inhibit gene expressbn in plants, such as antisense, 
ribozyme. introduction of exogenous genes into a host cell, insertion of a polynucleotide sequence into the coding 
sequence and^Dr the promoter of the endogenous gene of interest, and the like. 

55 

lll.A 1 . Antisense 

[2269] An expression cassette as described above can be transformed into host cell or plant to produce an antisense 
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tcaliy included. The poiyadenylation region can be derived from the natural gene, from a variety of other plant genes, 
or from T-DNA. 

[2292] The vector comprising the sequences from genes or SDF or the invention may comprise a marker gene that 
confers a selectable phenotype on plant cells. The vector can include promoter and coding sequence, for instance. 
5 For example, the marker nr^ay encode blocide resistance, particularly antibiotic resistance, such as resistance to kan- 
amycin. G4ie, bleomycin, hygromycin, or herbickie resistance, such as resistance tochk>rosulfuron or phosphinotrksin. 

IV. A. Coding Sequences 

10 [2293] Generally, the sequence in the transformation vector and to be introduced into the gerKxne of the host cell 
does not need to be absolutely kJentical to an SDF of the present invention. Also, it is not necessary for it to be full 
length, relative to either the primary transcrtptkxi product or fully processed mRNA. Furthermore, the introduced se- 
quence need not have the same intron or exon pattem as a native gene. Also, heter<^ogous non<coding segments can 
be incorporated into the coding sequence without changing the desired amino acid sequence of the polypeptide to be 

15 produced. 

IV.B. Promoters 

[2294] As explained above, introducing an exogenous SDF from the same species or an orthologous SDF from 
20 another species can modulate the expression of a native gene corresponding to that SDF of interest. Such an SDF 
construct can be under the control of either a constitutive promoter or a highly regulated inducible promoter {e.g., a 
copper inducible pronwter). The prorrKrter of interest can initially be either endogenous or heterobgous to the species 
in question. When re*introduced into the genome of saki species, such promoter becomes exogenous to sa\d species. 
C)ver-expressk>n of an SDF transgene can lead to oo-suppression of the homologous eruiogeneous sequence thereby 
2S creating some alteratkxis In the phenotypes of the transformed species as demonstrated by similar analysis of the 
chalcone synthase gene (Napoli et al.. Plant Ce!lg.279 (1990) and van der Krol et al.. Plant C©// 2:291 (1990)). If an 
SDF is found to encode a protein with desirable characteristk:s, its over-productkxj can be controlled so that its accu- 
mulation can be manipulated in an organ- or tissue-specific manner utilizing a promoter having such specificity. 
[2295] Likewise, if the promoter of an SDF (or an SDF that includes a promoter) is found to be tissue-specific or 
30 developmentally regulated, such a promoter can be utilized to drive or facilitate the transcription of a specific gene of 
Interest (e.p., seed storage protein or root-specific protein). Thus, the level of accumulatkxi of a particular protein can 
be manipulated or its spatial localization in an organ- or tissue- specific manner can be altered. 



35 



1^ 



IV. C Signal Peptkjes 



[2296] SDFs of the present tnventkxi containing signal peptWes are indicated in the REF and SEQ TABLES. In some 
cases it may be desirable for the protein encoded t>y an Introduced exogenous or orthologous SDF to be targeted (1 ) 
to a particular organelle intracellular compartment, (2) to interact with a particular nrwlecule such as a membrane mol- 
ecule or (3) for secretkxi outside of the cell harboring the introduced SDF. This will be accomplished using a signal 
^ peptide. 

[2297] Signal peptides direct protein targeting, are involved in ligand-receptor interactkxis and act in cell to cell 
communicatkm. Many proteins, especially soluble proteins, contain a signal peptkle that targets the protein to one of 
several different intracellular compartments. In plants, these compartments include, but are not limited to. the endo- 
plasmic reticulum (ER), mitochondria, plastids (such as chloroplasts), the vacuole, the Golgi apparatus, protein storage 

^ vessicles (PSV) and, in general, membranes. Some signal peptide sequences are conserved, such as the Asn-Pro- 
lle-Arg amino ackJ motif found in the N-terminal propeptkle signal that targets proteins to the vacuole (Marty (1999) 
77ie Plant Gell 1 1 : 587-599). Other signal peptides do not have a consensus sequence perse, but are largely composed 
of hydrophobic bxxwo ackte, such as those signal peptkles targeting proteins to the ER (Vitaie and Denecke (1999) 
The Plant Ce//1 1 : 61 5-628). Still others do not appear to contain either a consensus sequence or an identified common 

so secondary sequence, for instance the chloroplast stromal targeting signal peptides (Keegstra and Gline (1999) The 
Plant Ce// 11 : 657-570). Furthermore, some targeting peptides are bipartite, directing proteins first to an organelle and 
then to a membrane within the organelle (e.g. within the thytakoid lumen of the chloroplast; see Keegstra and Cline 
(1 999) The Plant Cell 1 1 : 557-570). In additbn to the diversity in sequence and secondary structure, placement of the 
signal peptide is also varied. Proteins destined for the vacuole, for example, have targeting signal peptides found at 

55 the N-terminus. at the C-terminus and at a suriace locatk>n in mature, folded proteins. Signal peptides also serve as 
ligands for some receptors. 

[2298] These characteristics of signal proteins can be used to more tightly control the phenotypic expression of 
introduced SDFs. In partk:ular, associating the appropriate signal sequence with a specific SDF can altow sequestering 
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MI.B. 1 , insertion of an Exogenous Gene 

[2280] insertion of an expression coristruct encoding an exogenous gene can boost the number of gene copies 
expressed in a host cell 

[2281] Such expression constructs can comprise genes that either encode the native protein that is of interest or 
that encode a variant that exhibits enhanced activity as compared to the native protevi. Such genes encoding proteins 
of interest can be constructed from the sequences from REF AND SEQ TABl_ES 1 AND 2. fragments thereof, and 
substantially similar sequence thereto. 

[2282] Such an exogenous gene can include either a constitutive proaK)ter permitting expression in any cefl in a host 
organism or a promoter that directs transcription only in particular cells or times during a host ceii life cycle or in response 
to environmental stimuli 

111-6,2. Regulatory Sequence Modulation 

[2283] The SDFs of REF and SEQ TABLES 1 AND 2, and fragments thereof, contain regulatory sequences that can 
be used to enhance expression of a gene of interest For example, some of these sequences contain useful enhancer 
elements, in some cases, duplication of enhancer elements or insertion d exogenous enhancer elements will increase 
expression of a desired gene from a particular promotei: As other examples, all 11 promoters require birxiing of a 
regulatory protein to be activated, while some prorrwters may need a protein that signals a promoter binding protein 
to expose a polymerase binding site. In either case, over-production of such protesis can be used to enhance expres- 
sion of a'gene of interest by irtcreasing the activation time of the promoter. 

[2284] Such regulatory proteins are encoded by some of the sequences in REF AND SEQ TABLES 1 AND Z frag- 
ments thereof, and sut^stantially similar sequences thereto. 

[228^ Coding sequences for these proteins can be constructed as described above. 
IV. Gene Constructs and Vector Construction 

[2286] To use isolated SDFs of the present invention or a combination of them or parts and^ mutants and/or fusions 
of said SDFs in the above techniques, reoorhbinant DNA vectors which comprise said SDFs and are suitable for trans- 
fonrnation of cells, such as plant cells, are usually prepared. The SDF construct can be made using standard recom« 
binant DNA techniques (Sambrook et al. 1989) and can be introduced to the species of interest by Agrobacterium- 
mediated transformation or by other means of transformation (e.g., particle gun bombardment) as referenced below 
[2287] The vector backbone can be any of those typical in the art such as piasmkis, viaises. artificial chromosomes, 
BACs, YAGs and PACs and vectors of the sort described by 

(a) SAC: Shizuya et al., Proc. Natl, Acad. Sci. USA 89: 8794-8797 (1992); Hamilton et al.. Proc. Natl, Acad. Sci. 
USA 93: 9975-9979 (1996); 

(b) YAC: Burke et aL. Science 236:806-812 (1987);. 

(c) PAC: Sternberg N. et aL, Proc Natl Acad Sci U S A. Jan;87(1):103-7 (1990); 

(d) Bacteria-Yeast Shuttle Vectors: Bradshaw et aL, Nud Adds Res 23: 4850-4856 (1995); 

(e) Lambda Phage Vectors: Replacement Vector, e.g.. Frischauf etaL. J. K/lol Biol 170: 827-842 (1983); or Insertion 
vector, e.g.. Huynh et aL. In: Glover NM (ed) DNA Cloning: A practical Approach, Vol.l Oxford: IRL Press (1 985); 

(f) T-DNA gene fusion vectors: V\falden et aL, Mol Cell Bk)l 1: 175-194 (1990); and 

(g) Plasmid vectors: Sambrook et aL, infra. 

[2288] TypkaOy, a vector will comprise the exogenous gene, whfch ki its turn comprises an SDF of the present 
invention to be introduced into the genome of a host cell, and which gene may be an antisense construct, a ribozyme 
construct chtmeraplast, or a coding sequence with any desired transcriptional and/or translational regulatory sequenc- 
es, such as promoters, UTRs. and 3* end termination sequences. Vectors of the invention can also include origins of 
replicatran, scaffold attachment regions (SARs). maricers. honrK>logous sequences, introns, etc. 
[2289] A DNA sequence coding for the desired polypeptkie, for example a cDNA sequence encoding a full length 
protein, will preferably be combined with transcriptional and translatksnal initiatbn regulatory sequences whrch will 
direct the transcription of the sequence from the gene in the intended tissues of the transformed plant. 
[2290] For exarrtple. for over-expresskKi. a plant promoter fragment nwy be employed that will direct transcription 
of the gene in all tissues of a regenerated plant. Alternatively, the plant promoter may direct transcription of an SDF of 
the invention in a specific tissue (tissuespecific promoters) or may be otherwise under more precise environmental 
control (inducible pronxHers). 

[2291] if proper polypeplkle productionis desired, a polyadenylation region at the 3'-end of the coding reg»n is typ- 
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One of ordinary skill in the art. having this data, can obtain ckxied ON A fragments, synthetic ON A fragments or polypep- 
tides constituting desired sequences by recombinant methodology known in the art or described herein. 

EXAMPLES 

[2309] The invention is illustrated by way of the following examples. The tnventbn is not limited by these examples 
as the scope of the invention is defined solely by the claims following. 

EXAMPLE 1: cDNA PREPARATION 

[231 0] A number of the nucleotide sequences disck>sed in REF AND SEQ TABLES 1 AND 2 herein as representative 
of the SDFs of the inventksn can t>e obtained by sequencing genomic DNA (gDNA) and^r cDNA from com plants 
grown from HYBRID SEED # 35A19. purchased from Pioneer Hi-Bred tntematkxial. Inc., Supply Management. P.O. 
Box 256. Johnston. Iowa 501 31 0256. 

[2311] A number of the nucleotkle sequences disclosed in REF AND SEQ TABLES 1 AND 2 herein as representative 
of the SDFs of the invention can also be obtained by sequencing genomic DNA from Arabidopsis thaliana, Wassilews- 
kija ecotype or by sequencing cDNA obtained from mRNA from such plants as descnbed bek)w This is a true breeding 
strain. Seeds of the plant are available from the Arabidopsis Biobgicai Resource Center at the Ohb State University, 
under the accession number CS2360. Seeds of this plant were deposited under the terms and conditbns of the Bu- 
dapest Treaty at the American Type Culture Collection. Manassas, VA on August 31 , 1999, and were assigned ATCC 
No. PTA-595. 

[2312] Other methods for cloning full-lengthcDN A are described, for example, by Sekietal., P/anfJou/na/15: 707-720 
(1998) High-efficiency cloning of Arabidopsis fulMength cDNA by biotinylated Gap trapper*; Maruyama et al., Gene 
138: 171 (1 994) Oligo-capping a simple method to replace the cap structure of eukaryotk: mRNAs with ollgoribonucle- 
otkles": and WO 96/34981. 

[231 3] Tissues were, or each organ was, individually pulverized and frozen in liquid nitrogen. Next, the samples were 
homogenized in the presence of detergents and then centrif uged. The debris and nuclei were renfx>ved from the sample 
and more detergents were added to the sample. The sample was centrifuged and the debris was renrx>ved. Then the 
sample was applied to a 2M sucrose cushion to isolate polysomes. The RN A was isolated by treatment with detergents 
and proteinase K f olk>wed by ethanol precipitatkxi and centrifugation. The polysomal RNA from the different tissues 
was pooled according to the folk>wing mass ratk>s: 15/15/1 for male inftorescences, female infk>rescences and root, 
respectively. The pooled material was then used for cDNA synthesis by the methods described below 
[2314] Starting material for cDNA synthesis for the exemplary com cDNA clones with sequences presented in REF 
AND SEQ TABLES 1 AND 2 was poly(A)-containing polysomal mRN/^ from inflorescences and root tissues of com 
plants grown from HYBRID SEED # 35A19. Mate inflorescences and female (pre-and post-fertilizatbn) inflorescences 
were isolated at various stages of devek)pment Selection for fx>ly(A) containing polysomal RNA was done using ofigo 
d(T) celluk)se columns, as described by Cox and GokJberg, Plant Molecular Bblogy: A Practk:at Approach", pp. 1 -35, 
Shaw ed., c. 1 988 by IRL, Oxford. The quality and the integrity of the polyA^^ RN As were evaluated. 
[2315] Starting material for cDNA synthesis for the exemplary Arabidopsis cDNA ckxies with sequences presented 
in REF AND SEQ TABLES 1 AND 2 was polysomal RNA isolated from the top-most inflorescence tissues of Arabidopsis 
thaliana Wassilewskija (Ws.) and from roots of Arabidopsis thaliana Landst)erg erecta (L. er.). also obtained from the 
Aiabkiopsis Bk>k)gk;al Resource Center. Nine parts inflorescence to every part root was used, as measured by wet 
mass. Tissue was pulverized and exposed to liquid nitrogen. Next, the sample was homogenized in the presence of 
detergents and then centrifuged. The debris and nuclei were removed from the sample and more detergents were 
added to the sample. The sample was centrifuged and the debris was removed and the sample was applied to a 2M 
sucrose cushion to isolate polysomal RNA. Cox et al., Plant Molecular Bbtogy: A Practical Approach', pp. 1-35. Shaw 
ed., c. 1988 by IRL, Oxford. The polysomal RNA %vas used for cDNA synthesis by the methods described below 
Polysomal mRNA was then isolated as described above for com cDNA. The quality of the RNA was assessed elec- 
trophoreticatly. 

[2316] Following preparation of the mRNAs from various tissues as described above, setectkxi of mRNA with intact 
5' ends and specific attachment of an oligonucleotide tag to the 5* end of such mRNA was performed using either a 
chemical or enzymatic approach. Both techniques take advantage of the presence of the cap* structure, whk:h char- 
acterizes the 5' end of rnosX intact mRNAs and which comprises a guarK>sine generally methylated once, at the 7 
positbn. 

[2317] The chemical modification approach involves the optional elimination of the 2'. 3'-cis diol of the 3* terminal 
ribose. the oxidatkxi of the 2', 3*-cis diol of the ribose linked to the cap ot the 5' ends of the mRN/^ into a dialdehyde. 
and the coupling of the such obtained dialdehyde to a derivatized oligonucleotide tag. Further detail regarding the 
chemk:al approaches for obtaining mRNAs having intact 5' ends are disck>sed in [ntematk3nal Application No. 
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oi the protein in specific organelles (plasttds. as an example), secretion outside of the celt, targeting interaction with 
particular receptors, etc. Hence, the inclusion of signal proteins in constructs mvohnng the SOFs of the invention in- 
creases the range of nnanipulatton of SDF phenotypic expression. The nucleotide sequence of the signal peptide can 
be isolated from characterized genes using common molecular biological techniques or can be synthesized in vitro. 
s [2299] In addition, the native signal peptide sequences, both amino acid and nucleotide, described in the REF and 
SEQ tables can be used to modulate polypeptide transport Further variants of the native signal peptides described in 
the REF and SEQ tables are contemplated. Insertions, deletions, or substitutions can be made. Such variants will 
retain at least one of the functions of the native signal peptide as well as exhibiting some degree of sequence identity 
to the native sequence. 

10 [2300] Also, fragments of the signal peptides of the invention are useful and can be fused with other sinal peptides 
of interest to modulate transport of a polypeptide. 

V. Transformation Technlqiies 

»5 [2301] A wide range of techniques for inserting exogenous polynucleotides are known for a number of host cells, 
including, without limitation, bacterial, yeast, mammalian, insect and plant cells. 

[2302] Techniques for transforming a wide variety of higher plant species are well known and described in the tech- 
nical and scientific literature. See, e.g. Weising et al.. Ann Rev. Genet ^421 (1988); and Oiristou. Euphytca. v. 85, 
n. 1-3: 13-27, (1995). 

20 [2303] DNA constructs of the invention may be introduced into the genome of the desired plant host by a variety of 
conventional techniques. For example, the DNA construct may be introduced directly into the genomic DNA of the 
plant cell using techniques such as electroporation and microinjection of plant cell protoplasts, or the DNA constaicts 
can be introduced directly to plant tissue using t>allistic methods, such as DNA particle bombardment. Alternatively, 
the DNA constructs may be combined with suitable TDNA flanking regkxis and introduced into a conventkxiai Agso- 

25 bacterium tumefadens host vector. The vimlence f unctkans of the Agmba<^erium tumefxiens host will direct the in- 
sertkDn of the construct and adjacent mariner into the plant cell DNA when the cell is infected by the bacteria (McCormac 
etal.. MoL BiotechnoL 8:'{99 (1997); Hamilton, Gene 200:107 (1997)); Salomon etal. EM60 J. 3:141 (1984); Herrera- 
Estrella et al. EMBO J. 2:^7 (1 983). 

[2304] Mcroinjection technk^ues are known in the art and well described in the scientifk: and patent literature. The 
30 introduction of DNA constructs using polyethylene glycol predpitatkm is described In PaszkDWSkI et al. EMBO J. 3: 

2717 (1984). Electroporatkxi technques are described in Fromm et al. Proc. Natl Acad. Sd. 1/S4 82:5824 (1985). 

Ballistk: transformatkan technk^es are described in Klein el at. Nature 327:773 (1987), Agrobacterium tumefadens 

mediated transformatbn techniques, including disarming and use of binary or cointegrate vectors, are well descrit>ed 

in the scientific literature. See. for example Hamilton, CM, Gene 200: 107 (1997); Muller et al. MoL Gen. Genet 207: 
3S 171 (1987); Komari et al, Pfanf J. J0:165 (1996); Venkateswariu et al. Biotechnology9:U03 (1991)an<i Gleave, AP., 

Plant MoL BioL 20:1203 (1 992); Graves and Goldman. Plant Mol. BioL 7:34 (1 986) and Gould et al.. Plant Physiology 

95:426(1991). 

[2305] Transformed plant cells whbh are derived by any of the above transformation techniques can be cultured to 
regenerate a whole plant that possesses the transformed genotype and thus the desired phenotype such as seedless- 

40 ness. Such regeneratksn techniques rely on manipulation of certain phytohonmones in a tissue culture growth medium, 
typk:ally relying on a bk)cide and/or he(bk:kle marker which has been introduced together with the desired nucleotkto 
sequences. Plant regeneratkm from cultured protoplasts Is described in Evans et al.. Protoplasts Isolation and Cu^re 
in Handbook of Plant Cell Culture.' pp. 124176. MacMillan Publishing Company. New Yoric. 1983; and Binding, Re- 
generation (^Plants, Plant Prot<^lasts. pp. 2173, CRC Press. Boca Raton, 1988. Regeneration can also be obtained 

45 from plant callus, explants. organs, or parts thereof. Such regeneration techniques are described generally in Klee et 
al. Ann. Rev. of Plant Phys. 38:467 (1987). Regeneration of monocots (rce) is described by Hosoyama et al. (Biosd, 
Biotectinol. Biochem. 58: 1 500 (1 994)) and by Ghosh et aL (J. Biotechnol. 1 (1 994)). The nucleic ackls of the inven- 
tion can be used to confer desired traits on essentially any plant. 

[2306] Thus, the invention has use over a broad range of plants, including species from the genera Anacardium, 
so Arachis, Asparagus, Atropa, Avena, Brassica, Citrus, Citrullus, Capsicum. Carthamus, Cocos, Coffea, Cucumis, Cu- 

curbita, Daucus, Elaeis, Fragaria, Gfydne, Gossypium, HeVtanthus. Heierocattis, Hordeum, Hyoscyamus, Lactuca, 

Linum, Lolium,Lupinus, Lycopersicon, Maius, Manihot, Maprana, Medicago. Nicotiana, Olea, Oryza, Panieum, Pan- 

nesetum, Persea, Phaseolus, Pistachia, Pisum, Pyrus, Pmnus, Raphanus, Ridnus, Secale, Senedo, Sinapis, Sola- 

num. Sorghum, Theobromus, Trigonella, Triticum, Vicm, Wis, Vigna, and, Zea. 
55 [2307] One of skill will recognize that after the expressk>n cassette is stably incorporated in transgenic plants and 

confirmed to be operable, il can be Introduced Into other plants by sexual crossing. Any of a number of standard 

breeding techniques can be used, depending upon the species to be crossed. 

[2308] The particular sequences of SDFs identified are provkled in the attached REF AND SEQ TABLES 1 AND 2. 
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ttons result In detection of hybridization l}etween sequences having at least 70% sequence identity. As described above, 
the hybridization and wash conditions can be changed to reflect the desired percentage of sequence identity between 
probe and target sequences that can be detected. 

[2329] In the following procedure, a probe for hybridization is produced from two PGR reactions using two primers 
from genomic sequence of Arabidopsis thafiana. As described above, the particular template for generating the probe 
can be any desired template. 

[2330] The first PGR product is assessed to validate the size of the primer to assure it is of the expected size. Then 
the product of the first PGR is used as a template, with the same pair of primers used in the first PGR, in a second 
PGR that produces a labeled product used as the probe. 

[2331] Fragments detected by hybridization, or other bands of interest, can be isolated from gels used to separate 
genomic DN A fragments by known methods for further puriTication and/or characterization. 

Buffers for nuclear DNA extraction 

[2332] 



1. 10XHB 





1000 ml 




40 mM spermidine 


10.2 g 


Spermine (Sigma S-2876) and spermidine (Sigma S-2501) 


10 mM spermine 


3.5 g 


Stabilize chromatin and the nuclear membrane 


0.1 M EDTA (disodium) 


37.2 g 


EDTA inhibits nuclease 


0.1 M Tris 


12.1 g 


Buffer 


0.8 M KG! 


59.6 g 


Adjusts ionic strength for stability of nuclei 


Adjust pH to 9.5 with 10 N NaO 

to inactivate this nuclease. 


H. It appears that there is a nuclease present In leaves. Use of pH 9.5 appears 



2. 2 M sucrose (684 g per 1000 ml) 

Heat about half the final volume of water to at>out 50*'C. Add the sucrose slowly then bring the mixture to close 
to final volume; stir constantly until it has dissolved. Bring the solution to volume. 



3. Sarkosyl solutk>n (iyses nuclear membranes) 





1000 ml 


N-lauroyl sarcosine (Sarkosyl) 
0.1 M Tris 

0.04 M EDTA (Disodium) 


20.0 g 

laig 

14.9 g 


Adjust the pH to 9.5 after all the components are dissolved and bring up to the proper volume. 



4. 20% Triton X-1 00 

80 ml Triton X-1 00 

320 ml IxHB (w/o p-ME and PMSF) 

Prepare in advance; Triton takes some time to dissolve 

A. Procedure 

[2333] 

1. Prepare IX H' buffer (keep ice-cokJ during use) 





1000 ml 


10X HB 


100 ml 
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W096/34981 published November 7.1996. 

[231 8] The enzymatic approach for ligating the oligonucleotide tag to the intact 5" ends of mRNAs involves the removal 
of the phosphate groups present on the 5* ends of uncapped incomplete mRNAs, the subsequent decapping of mRNAs 
having intact 5* ends and the ligation of the phosphate present at the 5' end of the decapped mRN A to an oligonucleotide 
tag. Further detail regarding the enzymatic approaches for obtaining mRNAs having intact 5* ends are disclosed in 
Dumas Mibie Edwards J.B. (Doctoral Thesis oH Paris VI University, Le cionage des ADNc oomplets: difficult^ et per- 
spectives rKNivelles. Apports pour r^tude de la regulation de Texpression de la tvyptophane hydroxylase de rat. 20 
Dec. 1993). EPO 625572 and Kato et aL, Gene 150:243-250 (1994). 

[2319] In both the chemical and the enzymatic approach, the oltgonucleotide tag has a restriction enzyme site (e.g. 
an EcoRI site) therein to facilitate later cloning procedures. FoOowing attachment of the oligonucleotide tag to the 
mRfsIA, the integrity of the mRNA is exarraned by performing a Northern blot using a probe complementary to the 
oligonucleotide tag. 

[2320] For the mRNAs joined to oligonucleotide tags usrig either the chemical or the enzymatic method, first strand 
cDNA synthesis is performed using an ollgo^ primer with reverse transcriptase. This oligo^lT primer can contain an 
internal tag of at least 4 nucleotides, which can be different from one mRNA preparation to another. Methylated dCTP 
is used for cDNA first strand synthesis to protect the internal EcoRI sites from digestion during subsequent steps. The 
first strand cDNA is precipitated using isopropanol after removal of RNA by alkaline hydrolysis to elimkiate residual 
primers. 

[2321] SecorKJ strand cDNA synthesis is conducted using a DN A polymerase, such as Klerraw f lagment arKi a primer 
corresponding to the 5' end of the ligated oligonucleotide. The primer is typically 20-25 bases in length. Methylated 
dCTP is used for second strand synthesis in order to protect internal EcoRI sites in the cDNA from digestion during 
the cloning process. 

[2322] Foltowvig second strand synthesis, the fulMength cDNAs are cloned into a phagemid vector, such as pBlue- 
Script™ (Stratagene). The ends oi the full-length cDN As are blunted with T4 DN A po^merase (Biolabs) and the cDNA 
is digested with EcoRI. Since methylated dCTP is used during cDNA synthesis, the EcoRI site present in the tag is the 
only hemi-methylated site; hence the only site susceptible to EcoRI digestion. In some instances, to facilitate subcbn- 
ing, an Hind III adapter is added to the 3" end of full-tength cDNAs. 

[2323] The full-length cDNAs are then size fractionated using either exclusion chromatography (AcA, Biosepra) or 
electrophoretlc separation which yields 3 to 6 different fractions. The fulMength cDNAs are then directionally cloned 
either into pBlueScript™ using either the EcoRI and Smal restriction sites or. when the Hind III adapter is present in 
the fulMength cDNAs, the EcoRI and Hind 111 restriction sites. The ligation mixture Is transformed, preferably by elec- 
troporatton, into bacteria, which are then propagated under appropriate antibiotic selection. 
[2324] Clones containing the oligonucleotide tag attached to fulMength cDNAs are selected as folbws. 
[2325] The plasmid cDNA libraries made as described above are purified (e.g. by a column available from Qiagen). 
A positive selection of the tagged clones is periormed as follows. Briefly, in this selection procedure, the plasmid DNA 
is converted to single stranded DNA using phage Fl gene II endonuclease in combination with an exonuclease (Chang 
et al.. Gene 127:95 (1993)) such as exonuclease 111 or T7 gene 6 exonuclease. The resulting single stranded DNA is 
then purified using paramagnetic beads as described by Fry et al.. Biotechniques }3: 124 (1992). Here the single 
stranded DNA is hybridized with a biotniylated oligonucleotide having a sequence corresponding to the 3* end of the 
oligonucleotide tag. Preferably, the primer lias a length of 20-25 bases. Clones including a sequence complementary 
to the btotinylated oligonucleotide are selected by irtcubation with streptavidin coated magnetic beads followed by 
magnetic capture. After capture of the positive clones, the plasmid DNA is released from the magnetic beads and 
converted intodouble strarKJed DNA using a DNA polymerase such as ThennoSequer^se™ (obtained from Amersham 
Pharmacia Biotech). Alternatively, protocols such as the Gene Trapper^ kit (Gibco BRL) can be used. The double 
stranded DNA is then transformed, preferably by electroporation. into bacteria. The percentage of positive clones 
having the 5* tag oligonucleotide is typically estimated to be between 90 and 98% from dot blot analysis. 
[2326] Following transformation, the libraries are ordered in microtiter plates and sequenced. The Ambtdopsb library 
was deposited at the American Type Culture Collection on January 7. 2000 as E-ooliVba 010600' under the accession 
number PTA-1161, 

EXAMPLE 2: SOUTHERN HYBRIDIZATIONS 

[2327] The SDFs of the invention can be used in Southern hybridizations as described above. The following describes 
extraction of DNA from nuclei of plant cells, digestion of the nuclear DNA and separation by length, transfer of the 
separated fragments to membranes, preparation of probes for hybridization, hybridization and detection of the hybrid- 
ized probe. 

[2328] The procedures described herein can be used to isolate related polynucleotides or for diagnostic purposes. 
Moderate stringency hybridization conditions, as defined above, are described in the present exanple. These condi* 
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12. Centrifuge at 164,000 x g for 16 to 20 hours in a fixed^gle rotor. 

1 3. Remove the dark red supernatant that is at the top of the tube with a plastic transfer pipette and discard. 
Carefully remove the ONA band with another transfer pipette. The DNA band is usually visible in room light; oth- 
erwise, use a long wave UV light to locate the band. 

14. Extract the ethidlum bromide with isopropanol saturated with water and salt. Once the solution is clear, extract 
at least two more times to ensure that all of the EtBr is gone. Be very gentle, as it is very easy to shear the DNA 
at this step. This extraction may take a while because the DNA soluton tends to be very viscous. If the solution is 
too viscous, dilute it with TE. 

1 5. Dialyze the DNA for at least two days against several changes (at least three times) of TE (1 0 mM Tris, 1 mM 
EDTA, pH 8) to remove the cesium chloride. 

16. Remove the dialyzed DNA from the tubing. If the dialyzed DNA solution contains a kH of debris, centrifuge the 
DNA solution at least at 2500 x g for 10 min. and carefully transfer the clear supernatant to a new tube. Read the 
A260 concentration of the DNA. 

17. Assess the quality of the DNA by agarose gel electrophoresis (1% agarose gel) of the DNA. Load 50 ng and 
100 ng (based on the OD reading) and compare it with known and good quality DNA. Undigested lambda DNA 
and a lambda^Hindlil-digested DNA are good molecular weight makers. 

Protocol for Digestion of Genomic DNA 

Protocol: 
[2334] 

1. The relative amounts of DNA tor different crop plants that provide approximately a balarKed number of genome 
equivalent is given in Table 3. Note that due to the size of the wheat genome, wheat DNA will be underrepresented. 
Lamtxia DNA provkjes a useful control for complete digeston. 

2. Precipitate the DNA by adding 3 volumes of 1 00% ethanoL Incubate at -20°C for at least two hours. Yeast DNA 
can be purchased and made up at the necessary concentration, therefore no precipftatk>n is necessary for yeast 
DNA. 

3. Centrifuge the solution at 11 ,400 x g for 20 min. Decant the ethanol carefully (be careful not to disturb the pellet). 
Be sure that the reskJual ethanol is completely removed either by vacuum desiccation or by carefully wiping the 
sides of the tubes with a clean tissue. 

4. Resuspend the pellet in an appropriate volume of water. Be sure the pellet is fully resuspended t>efore proceeding 
to the next step. This may take about 30 min. 

5. Add the appropriate volume of 1 0X reaction buffer provkJed by the manufacturer of the restrrction enzyme to 
the resuspended DNA followed by the appropriate volume of eruymes. Be sure to mix it property by sbwly swirling 
the tubes. 

6. Set-up the lambda digestkyi-contrd for each DNA that you are digesting. 

7. Incubate both the experimental and lamt>da digests overnight at SJ^'C. Spin down condensatbn in a microfuge 
before proceeding. 

8. After digestion, add 2 ^1 of k>ading dye (typically 0.25% bromophenol blue, 0.25% xylene cyanol in 15% Ficofl 
or 30% glycerol) to the lambda-control digests and toad in 1% TPE-agarose gel (TPE is 90 mM Tris-phosphate. 2 
mM EDTA. pH 8). If the lambda DNA in the lambda control digests are completely digested, proceed with the 
precipitation of the genomic DNA in the digests. 

9. Precipitate the digested DNA by adding 3 volumes of 100% ethanol and incubating in -20'C for at least 2 hours 
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(continued) 





— 1000 ml 


2 M sucrose 
Water 


250 ml a non-ionic osnx>ticum 
634 ml 


Added ]ust before use: 




100 mM PMSF* 

P-mercaploelhanol 


10 ml a protease inhibitor; protects nuclear membrane proteins 

1 m! inactivates nuclease by reducing disulfide bonds 



nOOmM PMSF 

(phenyl methyl sulfonyl fluoride, Sigma P>7G26) 
(addOU>8759 to 5 ml 100% ettianoO 
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2. Homogenize the tissue in a blender (use 30(M00 ml of IxHB per blender). Be sure that you use 5-10 mi of HB 
buffer per gram of tissue. Blenders generate heat so be sure to keep the honrogenate cokL it is necessary to put 
the blenders in ice periodically. 

3. Add the 20% Triton X-100 (25 ml per liter of homogenate) and gently stir on ice for 20 min. This lyses plastid, 
but rK>t nuclear, membranes. 

4. Filter the tissue suspension through seveial nylon fitters into an iceKX>ld beaker. The first fittratkm is through a 
250-micron membrane; the secor)d is through an 85-micron membrane; the third is through a 50-mcron membrane; 
and the fourth is through a 20-micron membrane. Use a large funnel to hold the fitters. Filtratkxi can be sped up 
by gently squeezing the liquid through the fitters. 

5. Centrifuge the filtrate at 1200 x g for 20 min. at 4''C to pellet the nuclei. 

6. Discard the dark green supernatant The pellet will have several layers to it. One is starch; ft is white and gritty 
The nuclei are gray and soft In the early steps, there may be a dark green and somewfiat viscous layer of chlo- 
roptasts. 

Wash the pellets in about 25 ml coki H buffer (with Triton X-100) and resuspend by swirling gently and pipetting. 
After the pellets are resusperKled. 

Pellet the nuclei again at 1 200 - 1 300 x g. Discard the supernatant 

Repeat the wash 3-4 times until the supernatant has changed from a dark green to a pale green. This usually 
happens after 3 or 4 resuspensions. At this point, the pellet is typk^ally grayish white and very slippery. The 
Triton X-100 in these repeated steps helps to destroy the chk)roplasts and mitochondria that contaminate the 
prep. Resu^>end the nuclei for a final time in a total of 1 5 ml of H buffer and transfer the suspension to a sterile 
125 ml Erienmeyer flask. 

7. Add 15 ml. dropwise. coW 2% Sarkosyl. 0.1 M Tris. 0.04 M EDTA solulkxi (pH 9.5) while swirling gently This 
lyses the nuclei The solution will t)ecome very viscous. 

8. Add 30 grams of CsCI and gently swirl at room temperature until the CsQ is in solution. The mixture will be gray 
white and viscous. 

9. Centrifuge the solution at 1 1 .400 x g at 4*C for at least 30 min. The tonger this spin is. the firmer the protein pelicle. 

10. The result is typrcatly a clear green supernatant over a white pellet and (perhaps) urxjer a protein pellk:le. 
Carefully remove the solution under the prc^ein pellicle and above the pellet. Determvie the dens'rty of the solutk>n 
by weighing 1 ml of solution and add CsCI if necessary to bring to 1 .57 g/ml. The solution contains dissolved solids 
(sucrose etc) and the refractive index akx^e will not t>e an accurate gukie to CsCI concentration. 



1 1 . Add 20 ^{ of 1 0 mg/ml EtBr per ml of solution. 
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B. Protocol for PGR Amplification of Genomic Ffaqments in Arabldopsis 

Amplification procedures: 

[2336] 

1 . Mix the following in a 0.20 ml PGR tube or 96-well PGR plate: 



Volume 


Stock. 


Final Amount or Cone. 


0.5 fil 


-10 ng/|il genomic DNA^ 


5ng 


2.5 fil 


10X PGR buffer 


20 mM Iris, 50 mM KGI 


0.75 Hi 


50 mM MgGl2 


1.5 mM 


Ijil 


10 pmol/^ Primer 1 (Forward) 


lOpmol 




10 pmol^ Primer 2 (Reverse) 


lOpmol 


0.5 pi 


5 mM dNTPs 


0.1 mM 


0.1 Hi 


5 units^tl Platinum Taq' (Life Technologies, Gaithersburg, MO) DNA 
Polymerase 


1 units 


(to 25 


Water 





2. The template DNA is amplified using a Perkin Elmer 9700 PGR machine: 



1 ) 94**C for 10 min. foltowed by 


2) 


3) 


4) 


5 cycles: 


5 cycles: 


25 cycles: 


94 •G - 30 sec 
62*G-30sec 
72 ''G - 3 min 


94 •G - 30 sec 
58 *C - 30 sec 
72 'G -3 min 


94 ''G - 30 sec 
53 •C - 30 sec 

72 °C - 3 min 


5) 72*C for 7 min. Then the reactions are stopped by chilling to4*G. 



[2337] The procedure can be adapted to a multi-well format if necessary. 

Quantification and Dilution of PGR Products: 

[2338] 

1. The product of the PGR is analyzed by electrophoresis in a 1% agarose gel. A linearized plasmid DNA can be 
used as a quantificatkm standard (usually at SO, 100, 200, and 400 ng). These will be used as references to 
approximate the amount of PGR products. Htndltl -digested Lambda DNA is useful as a molecular weight marker. 
The gel can be run fairly quickly; e.g.. at 100 volts. The standard gel is examined to determine that the size of the 
PGR products is consistent with the expected size and if there are significant extra bands or smeary products in 
the PGR reactions. 

2. The amounts of PGR products can be estimated on the basis of the plasmki standard. 

3. For the small number of reactions that produce extraneous bands, a small amount of DNA from bands with the 
correct size can be isolated by dipping a sterile 10-pl tip into the band while viewing though a UVTransilluminator. 
The small amount of agarose gel (with the DNA fragment) is used in the labeling reaction. 
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(preferably overnight). 

EXCEPTION: Arabidopsis and yeast DNA are digested in an appropriate voluine: they dont have to be precipitated. 

1 0. Resuspend the DNA in an appropriate volume of IE (e.g. . 22 fil x 50 blots = 1 1 00 jil) and an appropriate volume 
of 10X loading dye (e.g., 2.4 ^1 x 50 blots = 120 pJ). Be careful in pipetting the loading dye • it is viscous. Be sure 
you are pipetting the correct volume. 

Tables 



Some guide points in digesting geruxnic DNA. 


Species 


Genome Size 


Size Relative to 
Arabidopsis 


Genome Equivalent to 
2 fig Arabidopsis DNA 


Amount of DNA per 
blot 


Arabidopsis 


120 Mb 


IX 


IX 


2ji9 


Brassica 


1.100 Mb 


9.2X 


0.54X 


10 pg 


Com 


2.800 Mb 


23.3X 


0.43X 


20|ig 


Cotton 


2.300 Mb 


19.2X 


0,52X 


20 pg 


Oat 


11.300 Mb 


94X 


0.11X 


20 pg 


Rice 


400 Mb 


3.3X 


0.75X 


5^9 


Soybean 


1,100 Mb 


9.2X 


0.54X 


10 pg 


Sugart^et 


758 Mb 


6.3X 


0.8X 


10 fig 


Sweetclover 


1.100 Mb 


9.2X 


0.54X 


10 pg 


Wheat 


16.000 Mb 


133X 


0.08X 


20pg 


Yeast 


15 Mb 


0.12X 


IX 


0.25 pg 
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Protocol for Southern Blot Analysis 

[2335] The digested DNA samples are electrophoresed in 1 % agarose gels in be TPE buffer. Low voltage; ovemight 
separations are preferred. The gels are stained with EtBr and photographed. 

1. For blotttfig the gels, first incubate the gel in 0.25 N HCI (with gentle shaking) for about 15 mm. 

2. Then briefly rinse with water. The DNA is denatured by 2 incubations. Incul>ate (with shaking) in 0.5 M NaOH 
in1.5MNaClfor 15min. 



40 



3. The gel is then briefly rinsed in water and neutralized by incubating twk:e (with shaking) in 1.5 M Tris pH 7.5 in 
1.5MNaCifor15min. 



45 



50 



4. A nylon membrane is prepared by soaking it in water for at least 5 min. then in 6X BSC for at least 15 min. 
before use. (20x SSC is 175.3 g NaCl. 88.2 g sodium citrate per fiter. adjusted to pH 7.0.) 

5. The nylon membrane is placed on top of the gel and all bubbles in between are removed. The DNA is blotted 
from the gel to the membrane using an absort)ent medium, such as paper toweling and 6x SCO buffer. After the 
transfer, the membrane may be lightly brushed with a gkwed hand to remove any agarose strcking to the surface, 

6. The DNA is then fixed to the membrane by U V crosslinking and baking at 80*C. The membrane is stored at 4**C 
until use. 
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1)94*Cfor 10 min. 


2) 


3) 


4) 


5 cycles: 


5 cycles: 


25 cycles: 


gS'^C - 30 sec 
6rc-1 min 
73»C - 5 min 


95*C - 30 sec 
59"C - 1 min 
75»C - 5 min 


95*0 - 30 sec 
src - 1 min 
73»C - 5 min 


5) 72*'C for 8 min. The reactions are terminated by chilling to 4*C (hold). 



3. The products are analyzed by electrophoresis- in a 1% agarose gel, comparing to an aliquot of the unlabelled 
probe starting material. 

4. The amount of DIG-labeled probe is determkied as follows: 

Make serial dilutions of the diluted control ONA in dilution buffer (TE: 10 mM Tris and 1 mM EDTA. pH 8) as 
shown In the following table: 



DIG-labeled control DNA starting cone. 


Stepwise Dilution 


Final Cone. (Dilution Name) 


5 rig/]if 


1 Hi in 49 fil TE 


100 pg/ni (A) 


100pg/y (A) 


25filin25|iiTE 


50 pg/pj (B) 


50 pg/fil (B) 


25Httn25^TE 


25 pg/jii (C) 


25 pg/^il (C) 


20Hlin30HlTE 


10 pg/pl (D) 



a Serial deletions of a DIG-labeled standard DNA ranging from 100 pg to 10 pg are spotted onto a positively 
charged nylon membrane, maricing the membrane lightly with a pencil to identify each dilution. 

b. Serial dilutions (e.g.. 1:50, 1:2500. 1:10.000) of the newly labeled DNA probe are spotted, 

c. The membrane is fixed by UV crosslinking. 

d. The membrane is wetted with a small anK>unt of maleate buffer and then incubated in 1% blocking solution 
for 15 min at room temp. 

e. The labeled DNA is then detected using alkaline phosphatase conjugated anti-DIG antibody (Boehrtnger 
Mannheim, Indianapolis. IN, cat no. 1093274) and an NBT substrate according to the manufacture's instruc- 
Xlon. 

f. Spot intensities of the control and experimental dilutions are then compared to estimate the concentration 
of the PCR-DIG-labeled probe. 

D. Prehybridlzation and Hybridization of Southern Blots 

Solutions: 
[2341] 



100% Formamkle 


purchased from Qibco 


20X SSC 


(IX = 0,15 M NaCI. 0.015 M NaaCilrate) 


perL 


175 g NaCI 




87.5 g NasCitrate 2H2O 



20% Sarkosyl (N-tauroyl-sarcosine) 
20% SDS (sodium dodecyl sulphate) 
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C. Protocol for PCR-DIG-Labelmg of DNA 

Solutions: 
[2339] 

Reagents in PCR reactions (diluted PGR products, 1 0X PGR Buffer. 50 mM MgGl2, 5 U/^l Platinum Taq Potymerase. 
and the primers) 

1 0X dNTP + DIG-1 1 -dUTP [1 :5]: (2 mM dATP, 2 mM dCTP. 2 mM dOTP. 1 .65 mM dTTP. 0.35 mM DIG-1 1 -dUTP) 
10XdNTP + D!G-11-dUTP[1:10]: (2 mM dATP. 2 mM dGTP. 2 mM dGTP. 1.81 mMdTTP,0.19mM DIG-11-dUTP) 
lOXdNTP + DIG-1 1 -dUTP [1 : 15]: (2 mM dATP. 2 mM dCTP. 2 mM dGTP, 1 .875 mM dTTP. 0.1 25 mM DIG-1 1-dUTP) 
TE buffer (10 mM Tris. 1 mM EDTA. pH 8) 

Maleate buffer In 700 ml of deionized distilled water, dissolve 11.61 g maleic acid and 8.77 g NaCL Add NaOH to 
adjust the pH to 7.5. Bring the volume to 1 L Stir for 15 mia and sterilize. 

10%- blocking solution: In 80 ml deionized distilled water, dissolve 1.1 6g maleic acid. Next add NaOH to adjust 
the pH to 7.5. Add 10 g of the blocking reagent powder (Boehringer Mannheim, Indianapolis, IN, Cat. no. 10961 76). 
Heat to 60**C while stirring to dissolve the powder. Adjust the volume to 100 ml with water. Stir and sterilize. 

1% blocking solutkxi: Dilute the 10% stock to 1% using the maleate buffer. 

Buffer 3 (100 mM Tris, 100 mM NaCI, 50 mM MgCl2. pH9.5). Prepared from autoclaved solutions of 1M Tris pH 
9.5, 5 M NaCI, and 1 M MgCl2 in autoclaved distilled water. 

30 Procedure: 

[2340] 

1. PCR reactions are performed In 25 volumes containing: 

35 
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PCR buffer 


IX 


MgCl2 


1,5 mM 


10X dNTP + DIG-11-dUTP 


IX (please see the note below) 


Platinum Taq^ Polymerase 


1 unit 


10 pg probe DNA 




10 pmol primer 1 





Note: 





Use for: 


lOXdNTP + DIG-1 IkJUTP (1:5) 


<1 kb 


lOXdNTP + DIG-IIkJUTP (1:10) 
10X dNTP + DlG-11-dUTP (1:15) 


1 kbto1.8kb 
>1.8kb 



I :■ 



11:; 



2. The PCR reaction uses the folbwing ampliricatk)n cycles: 
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Washing buffer 



Maletc acid buffer with 0.3% (v/v) Tween 20. 



Blocking stock solutbn 



10% blocking reagent in buffer 1. Dissolve (1 OX concentration): blocking reagent pow- 
der (Boehringer Mannheim, Indianapolis, IN, cat. no. 1096176) by constantly stirring 
on a SS'^C heating block or heat in a mk:rowave, autoclave and store at 4^0. 



Buffer 2 

(1 X blocking solutk^n): 



Dilute the stock solutk)n 1:10 in Buffer 1. 



Detection buffer. 



0.1 1^ Tris. 0.1 M NaCI. pH 9.5 



Procedure: 



[2344] 



1 . After the post-hybrkJization wash the blots are briefly rinsed (1 -5 min.) in the maleate washing buffer with gentle 
shaking. 

2. Then the membranes are incubated for 30 min. in Buffer 2 with gentle shaking. 

3. Anti-DIG-AP conjugate (Boehringer Mannheim, Indianapolis, IN. cat. no. 1093274) at 75 mUAnI (1:10,000) in 
Buffer 2 is used for detection. 75 ml of solutkxi can be used for 3 blots. 

4. The membrane is incubated for 30 min. in the antibody solution with gentle shaking. 

5. The membrane are washed twwe in washing bulfer with gentle shaking. About 250 mis is used per wash for 3 
blots. 

6. The bk>ts are equilibrated for 2-5 min in 60 ml detectkxi buffer. 

7. Dilute CSPD (1:200) In detection buffer. (This can be prepared ahead of time and stored in the dark at 4*C). 
The following steps must be done individually. Bags (one for detection and one for exposure) are generally cut 
and ready before doing the folbwing steps. 

8. The bkA is carefully renK>ved from the detection buffer and excess liquki removed without dryvig the membrane. 
The blot is immediately placed in a bag and 1.5 ml of CSPD solutkxi is added. The CSPD solution can be spread 
over the membrane. Bubbles present at the edge arnJ on the surface of the blot are typically renrK)ved by gentle 
rubbing. The membrane is incubated for 5 min. in CSPD solutkxi. . 

9. Excess Iquid is removed and the membrane is blotted briefly (DNA side up) on Whatman 3MM papec Do not 
let the membrane dry completely. 

10. Seal the damp membrane in a hybridization bag and incubate for 10 min at 37*C to enhance the luminescent 
reaction. 

1 1 . Expose for 2 hours at room temperature to X-ray film. Multiple exposures can be taken. Luminescence continues 
for at least 24 hours and sigriai Intensity increases during the first hours. 

Example 3: Transformation of Carrot Cells 

[2345] Transformation of plant cells can be accomplished by a number of methods, as described above. Similarly, 
a number of plant genera can be regenerated from tissue culture foltowing transformatbn. Transformation and regen- 
eratk)n of carrot cells as described herein is illustrative. 

[2346] Single cell suspension cultures of carrot {Daucus carota) cells are established from hypocotyls of cultivar 
Early Nantes in Bg growth medium (O.L Gamborg et al.. Plant Physiol. 45:372 (1970)) plus 2.4-D and 15 mM CaCIg 
(B5 -44 medium) by methods known in the art. The suspension cultures are sutx;ultured by adding 10 ml of the sus- 
pension culture to 40 ml of B5-44 medium in 250 ml flasks every 7 days and are maintained in a shaker at ISO rpm at 
27 'C in the dark. 
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10% Blocking Reagent' In 80 ml deionized distilled water, dissolve 1.16 g maleic acid Next, add NaOH to adjust 
the pH to 7.5. Add 10 g of the blocking reagent powder. Heat to 60»C while stirring to dissolve the powder. Adjust 
the volume to 100 ml with water. Stir and sterilize. 



Prehybridization Mix: 




Final Concentration 


Components 


Vblume (per 100 ml) 


Steele 


50% 


Formamtde 


50 ml 


100% 


5X 


SSC 


25 ml 


20X 


0.1% 


Sarkosyl 


0.5 ml 


20% 


0.02% 


SOS 


0.1 ml 


20% 


2% 


Bkx^king Reagent 


20 ml 


10% 




Water 


4.4 mi 





General Procedures: 
[2342] 

1 Place the blot in a heat-sealable plastk: bag and add an appropriate volume of prehybridization solutfon (30 ml/ 
100cm2) at room temperature. Seal the bag with a heat sealer, avoiding bubbles as much as possible. I^y down 
the bags in a large plastic tray (one tray can accommodate at least 4-5 bags). Ensure that the bags are tymg flat 
in the tray so that the prehybridizatkxi solution is evenly distributed throughout the bag. Incubate the blot for at 
least 2 hours with genUe agitation using a waver shaker. 

2. Denature DIG-labeled DNA probe by incubating for 10 min. at 9B«C using the PGR machine and immediately 
cool it to 4*C. 

3. Add probe to prehybridizatk)n solutk>n (25 ng/ml; 30 ml = 760 ng total probe) and mix well but avoid foaming. 
Bubbles may lead to background 

4. Pour off the prehybridization solutton from the hybridization bags and add new prehybridization and probe so- 
lution mixture to the bags containing the membrane. 

5. Incubate with gentle agitatkxi for at least 16 hours. 

6. Proceed to medium stringency post-hybridizatkxi wash: 

Three times for 20 min. each with gentle agitation using IX SSC, 1% SOS at 60^0. 

All wash sokitions must be prewarmed to 60^0. Use about 100 ml of wash solution per membrane. 

To avoid background keep the membranes fully submerged to avoid drying in spots; agitate sufficiently to 
avoid having membranes stick to one another. 

7. After the wash, proceed to immunotogical detectkxi and CSPD development 
E, Procedure for Immunological Detection with CSPD 

Solutions: 



[2343] 
Buffer 1 



Maleic acid buffer (0.1 M maleic acid. 0.15 M NaCl; adjusted to pH 7.5 with NaoH) 
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7. The isolated nucleic acid molecule of any one of claims 1 -5, wherein said nucleic acid is capable of functioning as 
a promoter, a 3* end termination sequence, an untranslated region (UTR), or as a regulatory sequence. 

8. The Isolated nucleic acid molecule of claim 7. wherein said nucleic acid is a promoter and comprises a sequence 
selected from the group consisting of a TATA box sequence, a CAAT box sequence, a motif of GCAATCG or any 
transcrtptoin-factor binding sequence, and any combination thereof. 

9. The isolated nucleic acid molecule of claim 7. wherein the nucleic acid sequence is a regulatory sequence which 
is capable of promoting seed-specific expression, embryo-specific expression, ovule-specific expression, tapetum- 
specific expression or root-specific expression of a sequence or any combination thereof. 

10. A vector construct coniprising a nucleic acid molecule according to any one of claims 1-9. wherein said nucleic 
acid molecule is heterologous to any element in said vector construct. 

11. A vector construct according to claim 10 comprising: 

(a) a first nucleic acid having a regulatory sequence capable of causing transcription and/or translation; and 

(b) a second nucleic acid having the sequence of said isolated nucleic acid nix>lecule according to any one of 

claims 1 -4; 

wherein said first and second nucleic acids are operably linked and wherein said second nucleic acid is heterolo- 
gous to any element in said vector construct. 

12. The vector construct according to claim 11, wherein said first nucleic acid is native to said second nucleic acid. 

13. The vector construct according to claim 11, wherein said first nucleic acid is heterologous to said second nucleic 
acid. 

14. A vector construct according to claim 10 comprising: 

(c) a first nucleic acid having having the sequence of said isolated nucleic acid nrolecule according to claim 
7; and 

(d) a second nucleic acid; 

wherein said first and second nucleic acids are operably linked and wherein saki first nuclek: aad is heterok)gous 
to any element in said vector construct. 

15. The vector construct according to claim 14, wherein said first nucleic ackj is riative to sakJ second nucleic acid. 

16. The vector construct according to claim 14, wherein said first nuclek: ackl is heterok^gous to sakJ second nucleic 
acid. 

17. A host cell comprising an isolated nuclek: acid molecule according to any one of claims 1 -4, wherein said nucleic 
acid molecule is flanked by exogenous sequence. 

18. A host cell comprising a vector construct of any one of claims 10-16. 

19. An isolated polypeptide comprising an amnio acki sequence 

(a) exhibiting at least 40% sequence identity of an amino acid sequence encoded by a sequence shown in 
REF and/or SEQ Table 1 or 2 or a fragment thereof; and 

(b) capable of exhibiting at least one of the bblogical activities of the polypeptide encoded by sakj nucleotde 
seqence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

20. The isolated polypeptkle of claim 19. wherein said amino acid sequence exhibits at least 75% sequence identity 
to an amino acid sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

21. The isolated polypeptkje of claim 19, wherein said amino ackJ sequence exhibits at least 85% sequence kjentity 
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[2347] The suspension cutture cells are transformed with exogenous DMA as described by 2. Chen et at Plant MoL 
Bh. 36:163 (1998). Briefly. 4-days post-subcutture ceDs are tfK:ubated with cell waB digestion solution containing 0.4 
M sofbrtol. 2% driselase. 5mM MES (2-lN-Morpholino] ethanesulfonic acid) pH 5.0 for 5 hours. The digested cells are 
pelleted gently at 60 xg lor 5 min. and washed twice in W5 solution containing 1 54 mM NaCI. 5 mM KCI, 1 25 mM CaClg 

5 and 5mM glucose, pH 6.0. The protoplasts are suspended in MC solution containing 5 mM MES. 20 mM CaClg. 0.5 
M mannitol. pH 5.7 and the protoplast density is adjusted to about 4x10^ protoplasts per ml 
[2348] 15-60 Jig of plasmid DNA is mixed with 0.9 ml of protoplasts. The resulting suspension is mixed with 40% 
polyethylene glycol (MW 8000. PEG 8000). by gentle inversion a few times at room temperature lor 5 to 25 min. 
Protoplast culture medium known in the art is added into the PEG-DNA-protoplast mixture. Protoplasts are incubated 

10 in the culture medium for 24 hour to 5 days and cell extracts can be used for assay of transient expression of the 
introduced gene. Alternatively, transformed cells can bo used to produce transgenic callus, which in tum can be used 
to produce transgenic plants, by methods known In the art See. for example, Nomura and Komamine. Pit Phys. 79: 
988-991 (1 985). Identiftcathn and Isolation of Single Cells that Produce Somatic Embryos in Carrot Suspension Cul- 
tures. 

IS [2349] The inventkxi being thus described, it will be apparent to one of ordinary skill in the art that various modifica- 
ttons of the materials and methods for practtoing the invention can be nriade. Such nrxxiificatk)ns are to be considered 
within the scope of the invention as defined by the following claims. 

[2350] Each of the references from the patent and periodfcal fiterature cited herein is hereby expressly wicorporated 
in its entirety by such citation. 
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Claims 

1 . An isolated nucleic acid molecule comprising a nudek: ackJ having a nucleotkle sequence whk*) encodes an amino 
25 acid sequence exhibfting at least 40% sequence Wentity to an amino ackJ sequence encoded by 

(a) a nucleotide sequence described in REF and/or SEQ Table 1 or 2 or a fragment thereof; or ^ 

(b) a corr^lement of a nucleotkie sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

30 2. An isolated nuciek: ackl molecule comprising a nuciek: acid having a nucleolkJe sequence whfch exhibits at least 
65% sequence identity to 

(a) a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof; or 

(b) a complement of a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

35 

3. An isolated nucleic ackJ molecule comprising a nucleic ackJ having a nucleotide sequence which exhibits at least 
65% sequence identity to a gene comprising 

(a) a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof; or 
40 (b) a complement of a nucleotkle sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

4. An isolated nucleks ackJ molecule which is the reverse of the isolated nucleotkJe sequence according to any one 
of claims 1 -3. such that the reverse nucleotkle sequence has a sequence order whfch is the reverse of the sequence 
order of saki isolated nucleotkie sequence according to any one of claims 1-3. 

45 

5. An isolated nuclei acid molecule comprising a nuclei acid capable of hybridizing to a nuciek: add having a 
sequence selected from the group consisting of: 

(a) a nucleotide sequence wh»h is shown in REF and/or SEQ Table 1 or 2; and 
50 (b) a nucleotide sequence whk^h is complementary to a nucleotide sequence shown in REF and/br SEQ Table 

1 or 2; 

under cond(tk>ns that permit formation of a nuclec ackl duplex at a temperature from about 40*C and 48*0 betow 
the melting temperature of the nudek: acid duplex. 
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6. The nudek: ackl molecule according to any one of claims 1 -5, wherein saiti nuciek; add comprises an open reading 
frame. 
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27. A method for detecting a nucleic acid in a sample which comprises: 

(a) providing an isolated nucleic acid molecule according to any one ol claims 1-5; 
^ (b) contacting said isolated nucleic acid nrx>lecule with a sample under conditions which permit a comparison 

of the sequence of said isolated nucleic acid molecule with the sequence of DNA in said sample; and 
(c) analyzing the result of said comparison. 

28. The method according to claim 27, wherein said isolated nucleic acid molecule and said sample are contacted 
30 under conditions which permit the formation of a duplex between complementary nucleic acid sequences. 

29. A plant or cell of a plant which comprises a nucleic acid molecule according to any one of claims 1-4 which is 
exogenous to said plant or plant cell 

35 30. A plant or cell of a plant which comprises a nucleic acid molecule according to any one of claims 1 -4, wherein said 
nucleic acid rTK>lecule is heterologous to said plant or said cell of a plant. 

31 . A plant or cell of a plant which has been transformed with a nucleic acid molecule according to any one of claims 1 -4. 
^ 32. A plant of cell of a plant which comprises a vector construct according to any one of claims 10-16. 

33. A plant of cell of a plant which has been transformed with a vector construct according to any one of claims 10-16. 

34. A plant which has been regenerated from a plant cell according to any one of claims 29-33. 

45 



so 
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to an amino acid sequence encoded by a sequer^e shown in SEQ Table 1 or 2 or a fragment thereof. I 

22. The isolated polypeptide of claim 19. wherein said amirxj acid sequence exhibits at least 90% sequence identity 
to an amino acid sequerice encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

23. An antibody capable of binding the isolated polypeptide of any one of claims 1 9-22. 

n 

4 
( 

24. A method of introducing an isolated nucleic acid into a host cell comprising: 

; 

10 (a) providing an isolated nucleic acid molecule according to any one of claims 1 -4; and j 

(b) contacting said isolated nucleic with said host cell under conditions that permit insertion of said nucleic { 
acid into said host cell. 

25. A method of trar>sforming a host cell which comprises contacting a host cell with a vector construct according to 
IS any one of damns 10-16. 

26. A metfiod of modulating transcription and/or translation of a nucleic acid in a host cell comprising: i 

(a) providing the host cell of claim 24 or 25; and 

(b) culturing said host cell under conditions that pemnit transcription or translation. 1 
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